Our forecast team at Camus Energy recently celebrated an exciting milestone. We successfully deployed a cost-effective, meter-level load forecast system for all 1+ million smart meters in a large utility’s service territory.
To better understand why meter-level forecasting will become an essential capability for utilities, I highly recommend reading the companion blog post from our Chief Technology Officer, Cody Smith. In addition to sharing our perspective on why meter-level forecasting is transformative for utilities, however, I wanted to give a glimpse into how meter-level forecasting works.
In this blog, I’ll answer four of the most common questions about meter-level forecasting:
Understandably, the first question everyone asks when it comes to forecasting is: does it work? The answer: yes. Let’s explore how we know that meter-level forecasting is sufficiently accurate to inform utility operations.
Put succinctly, meter-level loads are highly auto correlated. That means the load yesterday and the load last week both have great predictive value for the load today and for the load next week. This is because both residential and commercial customers have regular patterns of behavior, and much of the remaining variability in load is weather dependent. To forecast meter-level load, we take advantage of both recent observations at every meter and the most advanced high resolution weather forecasts available.
Before we deploy our forecast system, we demonstrate accuracy and precision by “backtesting” on our utility partners’ real-world historical data and comparing predictions to what happened on their grids. The backtest reruns the forecast system over 12 months, withholding future data at each step, to accurately simulate what we would have predicted over the past year. In addition to assessing standard data science metrics like mean squared error, we evaluate forecast error using an economic cost function that measures error in dollars and cents based on the utility customer’s use cases. This cost function measures the accuracy of the forecast for its ability to inform action in the present where the error is evaluated as the cost to the business relative to the outcome if we had perfect foresight.
But perhaps the simplest way to evaluate accuracy is an eye test. The chart below shows actual meter-level forecasts (dotted lines) versus actuals (solid lines) for both net load (“AMI Usage”) and PV generation from a customer deployment. The dotted lines don’t line up exactly with the solid lines – and there are often small spikes in usage that are smoothed over in the forecast, but the forecasts are sufficiently accurate to inform and improve key utility decisions. Ultimately, that’s the goal.
It’s worth noting that while this example is a good forecast, it’s by no means an outlier. Some other meters are better and some are worse.
Our meter-level forecasting system uses a compilation of proven technologies and techniques, including machine learning, cloud computing, cohorting, and data visualization. Pulling these pieces together is what makes our forecasting approach both efficient and reliable.
Camus’s meter-level forecast system uses supervised machine learning, a type of artificial intelligence that can learn complex behaviors from previous observations to make future predictions. We select the inputs (or “features”) in collaboration with our utility partners, drawing on our collective knowledge and experience. We can tune these inputs for each forecast system, driven by the unique characteristics and behaviors of each distribution grid. The forecast model learns the relationship between the inputs and the forecasted net and gross load values. The result is a “trained” model which is later used to make predictions based on new inputs and the learned historical behaviors.
We are using XGBoost for supervised machine learning. This is an open-source library of “boosted tree” models, which combine decision trees in ways that strengthen predictions while limiting overfitting. A decision tree on its own is generally not good at prediction, but a series of decision trees can be quite powerful, with each tree correcting the errors of the previous one. We have found that XGBoost strikes a good balance: It is fast and efficient, it is good at learning nonlinear relationships among the input features, and the results are explainable. The forecast system design allows us to apply the right method for the job. While XGBoost is our baseline solution, we can apply neural networks where they are more appropriate for the task.
Running the forecast system on the Google Cloud Platform (GCP) enables us to provision compute resources when we need them during training and prediction while turning them off in between tasks. This is considerably more cost-effective for this kind of workload than on-premises approaches. Google’s platform also provides reliable services to access data efficiently and control parallel execution of forecast jobs. Our cloud-based approach builds on our team’s deep experience with cloud computing and distributed systems in other mission-critical industries – where cloud-native applications have significantly improved operations.
National Oceanic and Atmospheric Administration (NOAA) weather forecasts, now available directly in the cloud through the NOAA Open Data Dissemination program, are a key input in our forecast system. These high-resolution weather forecasts update every hour based on the latest meteorological observations. NOAA developed its High Resolution Rapid Refresh (HRRR) model to help predict renewable energy generation, and it assimilates data from satellites, radar, ground stations, and weather balloons. The complete petabyte-scale historical forecast dataset is publicly available, and we use it for training our models.
We build generation forecast models for each solar installation based on utility interconnection data and NOAA’s high-resolution irradiance forecasts. If near-real-time generation telemetry is available, we use it with machine learning techniques to evaluate forecast accuracy and improve the models. The accuracy depends on availability of detailed site information such as the inverter model and solar panel orientation. Partly cloudy days are challenging because it is hard to forecast the location of individual clouds. We plan to deploy stochastic forecasting (see “Where is meter-level forecasting headed next?”) to help improve accuracy on cloudy days.
Cohort-based modeling enables us to leverage all the available data to inform our forecast at each individual meter without incurring massive compute costs. We use meter metadata–such as rate class and EV ownership–to divide meters into cohorts with similar load behavior. Instead of building a separate model for each meter, we build one model for each cohort. This enables the forecast system to learn the load behaviors from data across many meters without the complexity and cost of millions of distinct models. We then use each cohort model to make individual predictions for every meter, allowing the forecast to adapt to meter-level changes.
The cohort approach is so effective that we don’t even need to train the algorithm on every meter. Instead, we train on a statistical subset of meters in each cohort. Cohorting improves the forecast accuracy while reducing the cost of operating the forecast system. This approach is especially useful for new meters or sudden changes in load behavior, such as those that may occur at short-term rental homes. If we were building models for each meter, we would need to retrain the algorithm with weeks of historical data to incorporate major load behavior changes. In contrast, the cohort models do not need to be retrained. They can adjust within a few days to new behavior at a particular meter because they learn from many meters over long training periods.
Forecasts are only helpful if operators are able to trust, verify, and use them. To help explain why a model made its predictions, we use a tool called SHAP. SHAP calculates the contributions of each input to each output.
In the example below, recent meter load observations (“correlated_load”) pushed the forecast load lower while the downward shortwave radiation (aka solar radiation) forecast (“weather_dswrf”) pushed the forecast load higher.
When utility staff better understand a model’s output, they are more likely to trust the predictions enough to inform important decisions. Visualizing the input contributions also provides an important feedback loop: when forecast error leads to an adverse outcome, we can discover why and use that understanding to improve the forecast. This may involve retraining the model with new features or different cohorts to make better predictions the next time similar conditions occur.
Every utility data landscape is different. Fortunately, our approach to meter-level forecasting is quite flexible to data gaps and availability – though there are minimum requirements.
We ask that our utility customers provide the best available geographic information system (GIS) and smart meter data, and then we use that data to build the best possible forecast. As we do so, we help identify data gaps and investment opportunities that would significantly improve forecasts and real-time load and generation visibility.
In order to conduct meter-level forecasting, utilities must collect meter data at about 3-hour intervals or better. We most commonly work with utilities who collect meter data at one-hour or 15-minute intervals. Unfortunately, mechanical meters that provide monthly readings are not sufficient to support meter-level forecasting.
We can help utilities set up secure channels for transferring the data to us. Even the best communications hardware will have temporary outages, so we deploy methods to quantify and mitigate the effects of meter data outages (missing inputs) on the forecast output.
Once the forecast system is operational, we work with utilities to integrate it into operational systems and business processes. For example, to improve Fault Isolation and Service Restoration, we can work with IT and operations teams to integrate the forecast output into an Advanced Distribution Management System. (Again, I recommend reading Cody’s companion blog post for more details on meter-level forecasting applications and business cases.)
Today, we are focused on deterministic forecasting, estimating the most likely load or generation value at each meter on an hourly time scale from now to 48 hours in the future. As utility operations and energy markets evolve, we will extend our capabilities for both short range (0 to 30 minutes) and long range operational forecasting (2 to 30 days).
We also plan to evaluate stochastic forecasting. Unlike deterministic forecasts, stochastic forecasts characterize uncertainty by providing a range of potential outcomes, each with different probabilities. A stochastic approach could be useful for solar generation forecasts that are sensitive to variable cloud cover. It could help us understand and account for low probability, high impact extreme weather events. Among the many technical challenges to robust stochastic forecasting, I expect the most difficult one will be presenting the range of possible outcomes in a way that improves a grid operator’s ability to make the best real-time decisions.
Forecasting load and generation for individual meters is a key enabler for operating a more dynamic distribution grid. Operational foresight allows corrective action before meter data becomes available and will help boost reliability and efficiency. As distribution grid operation becomes more complex and fast-changing, forecasting must continue to evolve and improve to meet these needs.
We look forward to continuing to partner with pragmatic, forward-thinking utilities to deliver the tools and capabilities needed to support a 100% electrified future – including advanced forecasting.