Changes in the energy utilities landscape, such as increased adoption of renewable energy technologies, has made traditional forecasting models like the OLS regression less suitable for the task. It is unusual to see utility forecasters fit their models with a plethora of binary encoded features (i.e., month, day of week, hour, holidays, etc.) rendering their models difficult if not impossible to interpret. Consequently, modern approaches such as boosted trees, generalized additive models (GAM) and RNNs have garnered growing interest because of their abilities to forecast at scale.
Energy load forecast was modeled using GAM. Unlike machine-learning/AI approaches, GAMs are highly interpretable and therefore suitable for developing defensible forecasts. This approach was influenced by P. Gaillard, et. al. (2015) Semi-parametric models and robust aggregation for GEFCom2014 probabilistic electric load and electricity price forecasting, Ziel, F. & Bidong, L. (2016) Lasso Estimation for GEFCom2014 Probabilistic Electric Load Forecasting and contributions towards applied GAM literature by Dr. Matteo Fasiolo.
Energy load data was collected from 2014-2017. Hourly weather data was collected from the NOAA LCD website. Two weather stations -the only stations with hourly recordings for this region- were used for the model; weather metrics were averaged and then exponentially transformed with a smoothing constant (α=0.95). Solar irradiance data was collected and aggregated to create a unitized profile. Solar is a function of both the unitized solar irradiance profile and the installed kWh capacity. In other words, solar is the cumulative PV contribution on the grid. Major holidays -Thanksgiving, Christmas Eve/Day and New Year’s Eve- were used to create a simple binary feature called holidays. Additional features such as the day of week, instant (hour of the day), position (incremental step between 0-1 for a year), and trend (incremental step between 0-1 for the entire time frame) were created to help capture seasonal changes and trend.
The GAM was fit using 2014-2016 data and validated using 2017 data.
fit4 <- bamV(log(kw)~ dow + holiday + s(trend, k=8, bs='cr') + s(solar, bs='cr', k=15) + s(drybulb, bs='cr', k=15) + s(drybulb95, bs='cr', k=15) + s(dewp, bs='cr', k=10) + s(dewp95, bs='cr', k=25) + s(instant, bs='ad', k=20) + s(position, bs='ad', k=40) + ti(solar, instant, k = c(10,10), bs = c('cr', 'cc')) + ti(drybulb, instant, k = c(5,10), bs = c('cr', 'cc')) + ti(drybulb95, instant, k = c(10,10), bs = c('cr', 'cc')) + ti(dewp, instant, k = c(5,10), bs = c('cr', 'cc')) + ti(dewp95, instant, k = c(15,10), bs = c('cr', 'cc')) + ti(position, instant, k = c(35,10), bs = c('cc', 'cc')), data = x_train, aGam = list(discrete=TRUE, nthreads=11, select=TRUE), aViz = list(nsim=50))
Trend, Daily and Annual Seasonal
Energy load slightly increased in recent years. It is hypothesized that the increase was due to PV adoption, atypical weather conditions and increasing AC usage. The instant, hours in a day, supports evidence of PV adoption as the ‘duck bill’ was apparent in the solar day. The instant marginal effect plot also validated the expected ‘typical’ day in that load picked up around 5AM until the peak hour at 6PM. The position, incremental day/hour in a year, supported our assumptions that energy load demand is sensitive to both tourist and ‘snowbird’ arrivals. Historically, the highest point -or annual peak- occurs in the last couple of weeks of the year. In previous attempts using GLM, the model poorly forecasted the annual peak even with the use of a binary feature for the ‘winter season’.
The marginal effects plot on the interaction between instant and solar suggests that energy load during the solar day is reduced as PV adoption grows. In other words, the more PV installed on the grid, the greater its impact on reducing energy load demand.
The first four marginal effects plots suggests that a non-linear relationship exists between weather and energy load; this relationship would not be obvious using a GLM. The contour plots on the interctions between weather and hour further explains this complex relationship. The exponentially smoothed weather features reveals that energy load increases when there is prolonged warm and saturated weather. In addition, the drybulb marginal effects plots suggests that the weather metric is muddled by PV contribution.
In regards to accuracy, the GAM was comparable to GLM (not shown). However, the GLM required over 30 features, many of which were dummy features representing month, day of week and hour of the day. Consequently, the GLM result was difficult to interpret to laypersons (e.g., difficulties with explaining the use of interactions, the differences between main and interaction effects and seasonal dummy features). With the help of
bamV() from the R mgcViz library, the GAM forecast model proved to be a highly interpretable model that helped to explain the complex relationships in energy load forecasting.