Over the past ~18 months, our team has spoken with a wide variety of California community choice aggregators (CCAs) to better understand their vision, strategy, and challenges for achieving a zero-carbon energy future. Broadly, we’ve been impressed with the progress made to deliver clean energy, equitable access, and resilience to communities across the state.
But there’s one thing that’s proven confusing.
Numerous CCAs are spending massive amounts of time to ingest, clean, structure, and prepare the same sets of data from the same sources. In a sector where CCAs seek to collaborate more than compete, there is a clear opportunity to reallocate time and money spent on duplicative data preparation to analyzing data for day-to-day decisions.
Data scientists and analysts at CCAs have a tremendous opportunity to leverage diverse data sets to provide insights and analysis for new programs, rate design, and strategic investments. Spending time cleaning settlement-quality meter data (SQMD), for instance, seems inefficient when others have already completed that challenge.
Instead, CCAs should adopt a shared approach to data management. This can free up time for the data scientists to focus on high-impact work, result in lower latency data access, and increase the ability for CCAs to share and replicate analysis.
A Common Data Model (CDM) is a standard for defining data structures and the relationships between them. It makes it possible for different applications to use the same underlying data sources and understand those data sources in the same way.
Without a common data model, every application goes to the original source data and interprets it in its own way (e.g. changing Utility A’s “MeterID” to “Meter Number” and Utility B’s “METERno” to “Meter Number”). A different application may take the same data and interpret it differently (e.g. changing both utility’s meter data to “Meter Identity”). Those applications then need their own mapping of data structures and relationships in order to work together. With a common data model, every application can work together efficiently. The model serves as a Rosetta Stone, translating the databases into a shared language.
For California CCAs, building or adopting a common data model that’s used across organizations would be a foundation for streamlining the sharing of information between CCAs and hastening the adoption of new tools. When one CCA creates an analysis measuring PSPS events in low and middle income census areas, another CCA could replicate it by using the same data sources.
While creating a common data model makes it easier for CCAs to share insights, analyses, and approaches, it doesn’t fully solve the problem of duplicative data prep across CCAs. The process of cleaning and structuring the data to align with the common data model, and then warehousing that data to be quickly and easily accessed, is time-intensive.
Rather than repeating that process for every CCA, data aggregators should provide clean, structured data from a multitude of relevant sources to multiple CCAs – through fully managed data warehouses like Google’s BigQuery or Amazon’s Redshift. These tools can process terabytes of data in seconds – offering a massive boost over alternatives like Microsoft Excel.
In theory, data aggregators exist already. But in practice, they only provide a few distribution utility datasets – and do so in a format that requires significant cleaning and structuring. The existing data aggregators typically provide the data in unstructured, flat file format (e.g., CSV or Shapefiles). CCAs need the data in an easy to query, well-structured data warehouse.
CCAs should demand better – either through partnering with vendors who can offer clean, structured data or by building such capabilities jointly.
To further accelerate the sharing of insights, CCAs can also benefit from using proven, off-the-shelf (or open source) business intelligence and visualization tools. For organizations using a common data model (and potentially a shared data warehouse structure), the sharing or replication of reports through tools like Google Data Studio, Microsoft PowerBI, and Tableau is fast and simple. With the same intelligence and visualization tools, one CCA can immediately replicate the analysis of another, simply by pointing the analysis at its own data set.
Take the example of a simple load profile explorer. With a shared data approach, a CCA creating the following report can share it with a colleague -- who can then create and customize their own version in minutes.
At Camus Energy, we’ve seen firsthand the ease with which a CCA’s helpful analysis can be immediately replicated at another CCA. We believe CCA leaders can and should band together to converge on a common data model, procure clean, structured data, and foster a culture of collaboration among data scientists from different organizations.
By working together through associations like CalCCA, California CCAs have an opportunity to take the innovations developed at each individual CCA and rapidly scale those to others in the state.
That’s a path to achieving some big wins for communities all over California.