From The Team

Utilities Need “Stretch-y” Computing In The Cloud

Authored by:

Last Updated:

November 13, 2024

If you’ve ever traveled in the developing world, you may have noticed something different about many of the buildings. It’s common to have columns with exposed rebar sticking out of the roof so that additional stories can be added later. This makes it easier, faster, and cheaper to expand the buildings when more space is needed.

These buildings present an apt metaphor for cloud computing platforms, which provide data storage, analysis, software, and other services via the internet. Unlike traditional computers, these platforms are “stretch-y.” Tools and applications designed in the cloud can run on one server and then scale to run on hundreds or even thousands of servers simultaneously as the workload expands—an approach known as horizontal scaling. Through parallel computing, jobs are sliced into small components handled by pools (groups) of servers.

For utilities, scalable computing is becoming a must. As the clean energy transition progresses, the grid is more distributed, with more devices to track, more data to collect, more unpredictable behavior to navigate, and more scenarios to consider. This translates into new types of problems for grid operations and planning. Solving these problems will require increasingly complex analyses.

One example that I often see in my work: More utilities want to forecast net load—the total power demand from customers minus output from rooftop solar and other renewable resources. It’s essential for utilities to know net load because it represents the amount of power that they need to request from their dispatchable power plants to keep the lights on. The rapid growth of distributed energy resources (DERs) is making it difficult to forecast net load accurately. Feeder-level forecasts are increasingly error-prone, so meter-level forecasting is likely necessary. But that requires analysis of data from millions of smart meters and tens of thousands of DERs—not to mention reams of weather data.

How Scaling in the Cloud Works

For decades, utilities have stored, processed, and analyzed their data in on-premises facilities. Historically, these facilities have had stable workloads, and computing scalability has largely been unnecessary.

But that is changing. As utilities conduct more sophisticated analyses, they are approaching the computing capacity limits of their on-premises facilities. When an application reaches a server’s capacity, it will not function properly and eventually fail. Utilities can redesign the software code so that it runs concurrently on two or more servers, but this can potentially take months or even years—and may require hiring external engineering specialists. Installing and configuring new servers for new applications also takes time—typically weeks or months. Another challenge with on-premises facilities is that they typically use software applications, hardware, and other technologies with limited capabilities to scale computing workloads.

With cloud computing, workloads can seamlessly expand to as many servers as needed. Jobs are broken into smaller pieces that can work in isolation on different servers. If a server experiences hardware problems, a workload can easily move to another server.

There are several different ways to scale computing resources in the cloud. I’ll describe the three most common approaches:

Elastic: The envelope for workloads grows and shrinks automatically in response to real-time capacity needs. For instance, a large analysis might grow from 10 to 1,000 servers during its first phase, shrink to 200 servers for its second phase, and shrink again to 10 for its third phase. Users don’t need to pay for computing resources that sit idle in between capacity-intensive analyses.
Pooled: Users share a pool of resources that is so large that there is no need to grow and shrink. Each workload is spread out over many servers.
Adjustable: In many cases, it’s sufficient to manually reconfigure software applications to adjust the size of the server pool. This is a simple task for an engineer.

Computational speeds in the cloud can be jaw-dropping. With one cloud service, we can apply a query to all of a large utility’s historical smart meter data—terabytes of data—and receive complete results in 10 seconds. Such an undertaking would take hours or days in an on-premises facility.

To illustrate the value of being able to rapidly expand and shrink computing capacity, I’ll return to the example of net load forecasts. A utility may need to retrain a net load forecasting model on new data for 4 hours every week to maintain its accuracy. During this period, numerous additional servers would be needed to handle the computation. Then, the model would go to sleep for more than 6 days—and the capacity need would shrink dramatically—until the next run.

Enabling Sophisticated Tools In A Flash

Utilities will need many sophisticated tools to manage complex, rapidly changing grids. Often, it will be necessary to build new applications quickly. Horizontal scaling in the cloud can facilitate this.

For instance, when we develop any forecast algorithm, we typically consider different parameters that contribute to the grid condition or behavior we’re trying to predict. There could be 20 or more parameters under consideration.

We can rapidly add and remove from the code various parameters and evaluate the forecast accuracy for different combinations of parameters. We do this by “back-testing”—running the tool on real-world historical data and comparing its predictions to what actually happened. Back-testing allows us to determine a much smaller subset of parameters that maximize forecast accuracy. Even though back-tests require a massive amount of computing, we can complete all of them in the cloud in hours.

To support a large utility, we recently developed a net load forecast algorithm that uses data from more than a million smart meters to make predictions every hour for every meter. The output helps the utility determine if it has power capacity sufficient to reroute power flows and de-energize circuits for maintenance. The utility’s current maintenance planning approach is a manual analysis of smart meter data once every two years—far less granular than our forecast tool. We developed, tested, and implemented the tool in less than a month.

The cloud also makes it easy to update and upgrade tools over time. A year after rolling out a forecast tool, a utility may decide to add a new parameter to the inputs. This would be a simple matter of revising the code with the new parameter and comparing the accuracy of the tool’s new predictions to the accuracy of the old ones. The new parameter would be kept if it makes the tool more accurate.

A side note: When we’re building or revising any tool in the cloud, we have the ability to put copies of the tool in development environments. These are isolated systems that mimic the grid where we can evaluate new code in simulations in minutes. This allows us to make a series of small improvements, resulting in much better software over time. This rapid iteration process is very difficult to replicate in on-premises facilities.

Cloud Computing and On-Premises Facilities Can Be Complementary

What should go in the cloud and what should reside on-premises? A utility’s direct physical controls should stay on-premises. Consider SCADA (supervisory control and data acquisition) systems that enable utility control room operators to communicate with and control important grid assets such as substations. These mission-critical communications typically happen over the utility’s private communication network. It’s preferable to send control commands from a physical computer in the control room rather than from a distant cloud server. Workloads for direct controls grow only when utilities deploy new hardware, so rapid scaling of computing resources is generally not needed.

Besides direct physical controls, all other applications, tools, and analyses can benefit from the cloud, as long as they are “cloud-native”—in other words, designed upfront to take advantage of cloud’s horizontal scaling capabilities. Such applications contain code for distributing work and data in parallel across multiple servers.

Existing applications that are operating effectively in on-premises facilities don’t necessarily need to migrate to the cloud. Significant code revisions—and even a complete rewrite—may be needed to ensure the tool works optimally on a cloud platform. A rewrite may not be cost-effective.

Sometimes, utilities migrate their existing applications from on-premises facilities to the cloud without any scalability-enabling code changes. This is often referred to as “lift-and-shift.” A workload that was designed to run on two on-premises servers might shift to running on two servers in the cloud. There is little scaling benefit in such a migration.

Some utility IT vendors with long-running expertise in on-premises software development now offer their software for the cloud, but without the code revisions needed to take full advantage of cloud’s scalability. Before purchasing cloud products, it’s important for utilities to ask vendors pointed questions to determine whether they are skilled at designing cloud-native software.

Scalable cloud computing resources can be complementary to a utility’s on-premises facility. It’s easy to write code that directs the continuous transmission of data from SCADA, advanced metering infrastructure (AMI), geographic information systems (GIS), and other on-premises systems to the cloud. Once that data is in the cloud, it can feed into tools and analyses that require expandable computing resources.

For grid operations teams, cloud-native software services can deliver deep, granular insights on real-time and near-future grid conditions. For grid planning teams, scalable cloud computing can support rapid analysis of many more scenarios and grid configurations while enabling quick development of new tools. Together, these cloud-enabled capabilities give utilities a new level of agility, allowing them to adapt continuously through the energy transition.

If you have questions about cloud scalability, security, or transitioning critical utility applications from on-prem, our team is happy to help. Contact us to learn more.