By 2020, the Netherlands plans to have 15 million smart gas and electricity meters installed as part of a national rollout following a successful smart meter pilot, according to Dutch Minister of Economic Affairs Henk Kamp.
So, in a few years there will be 15 million smart gas and electricity meters installed, each of these will upload daily the meterreadings taken every 15 minutes (or less) for electricity meters and every hour for gas meters according to the meteringcode. Let’s see how many data records, let alone data, that is assuming there will be 8 million electricity meters and 7 million gas meters:
- On a daily base: 8 (M meters) x 24 (hours) x 4 (quarters in a hour) = 768 million records for electricity meters and 168 million for gas meters making 936 million records daily
- On a yearly base: 936 million x 365 (days) = 341 BILLION records
After 3 years, you will have more than a TRILLION records with data! Good luck with putting that in a relational database, it will be very expensive to just store this information into a rdbms. The largest gridco in the Netherlands might own maybe half of all of these meters, but half a trillion is still an awful lot of meterreadings. Question is, is it really necessary to put this mountain of data into a rdbms?
The rule of thumb I have is: if it is low-value, large quantity of data then don’t store it in an expensive rdbms. If it is high-value, low quantity of data then store it into a rdbms. Anything in between? Pick and choose!
I argue that meterreadings are low-value, large quantity data that shouldn’t be put into a rdbms. Why? because these meters are placed in a consumer connectionpoint, and consumers in the Netherlands typically use 3500KWh each year. Since meterreadings are in KWh, and on average you use 1 KWh each 2.5 hours, so they will only change every 2.5 hours ( 3500/(365x24x4) ). On average only one out of 10 meterreadings will change, the other 9 will not change.
What do I most probably want from meterreadings, since most of them are not changing anyway? I would want to have usage summaries, both per connectionpoint per day/week/month/year and for a collection of connectionpoints.
Connectpoint meterreadings summary per connectionpoint (horizontal) and per collection of connectionpoints (vertical)
In order to process this volume of data, one would need simplicity of design, horizontal scaling and finer control over availability. Do these demands sounds familiar to you? They should, because NoSQL databases were built for these purposes. NoSQL databases can easily scale to a great number of cheap nodes, but the price you pay for scalability is simplicity. NoSQL databases can’t process difficult questions like links to other data, but they can aggregate data on a grand scale very fast.
Why not use a NoSQL database as a pre-processing system? Let it do what is does best: fast and cheap processing of simple data. This is what is needed, refining low-value large-quantities of data into high-value low-quantities of data without spending a fortune. Once you have this high-value data, store it into a data warehouse to be processed further.
NoSQL database as a pre-processor for a data warehouse
By building a number of pre-defined data cubes/reports in a NoSQL database, you can have a standard stream of data going from the NoSQL database to the data warehouse. For example usage per connectionpoint per month or usage for an entire cityblock could be such a data stream. For a little bit more dynamic refining, a subset of connectionpoints can be transfered from the data warehouse to the NoSQL database. Then the subset can be used as an base population for the data processing.
Personally I find a lot of interesting questions for gridco’s that could be solved this way, like a cityblock KPI dashboard, indicating the health of every cityblock energy usage in a state or nation. Health indicators could be the number of solar panels in a cityblock, and how much energy was delivered back into the network by these solar panels.
Example of a KPI Dashboard for power outages