When I rub my hair with a towel, vigorously, my wife often says to me, “that’s a terrible thing to do to your hair”. I had that in mind when I spoke on the subject of blending, not mashing, data at a conference on cloud computing earlier this year. (Here’s my talk)
‘Mashing’ is a truly terrible thing to do to data. It calls to mind the indiscriminate bashing of a boiled potato with a blunt instrument; and that’s not what we should do to data to deliver strategic insight.
I much prefer the word ‘blend’. ‘Blend’ calls to mind a number of key features when we link data. Firstly, it suggests precision – the careful addition of carefully considered elements. Secondly, it suggests there could be a number of data ingredients rather than only a couple. And finally, it indicates that the consumer might be satisfied with the result; the knowledge that an output is the result of carefully combined ingredients inclines us to ready our palates!
A good example of the complexity involved in the blending of data is that associated with making sense of the NHS Prescribing Data for England. This is a list of every item on every prescription. The data is released monthly. To manage the data we need to blend it with three other data sets. These are:
1. The BNF (British National Formulary). This gives us the drug hierarchy which is the mechanism we need to manage the data volume in a structured way
2. Population data. Population data enables the standardisation of the data. With population data we can identify the number of items and cost per (say) 1000 patients and accurately compare prescribing behaviour.
2. Geocoding data. The data contains a GP practice code. We can use that to put a point on a map. But we can go further and draw polygons on a map (themed areas). That in turn enables us to match prescribing data with other data and begin to explain prescribing patterns. For example, we can match prescribing patterns with (say) patterns of poverty.
And we don’t crudely mash these data; we carefully blend them.