The value of data has grown exponentially in the last few years. As people realise that good data and analysis leads to good decisions, there’s a virtuous cycle that calls for more and more data.
In digital technology (e.g. cellphones or websites), it’s pretty easy to collect data at every point in a system. In the “physical” (i.e. analogue) world, however, it’s not that easy. Sensors aren’t connected to every part of each system, and there are billions (or trillions?) of systems out there. Think of every car, river, garbage bin, home, tree and indeed, person. Each is itself a system, and a part of a larger system – and each has the potential to produce valuable data. My own interest here is in the sustainability of these systems, and the need to measure and track things usually bumps up against this constraint. Many organisations are finding the same.
So, when data is available, there is potential to be hugely disappointed by the results of analysis. This disappointment will largely be felt by decision makers relying on the data, and not by data analysts. This is because a data analyst will understand the limitations of data, and its power – without mixing up the two.
I think the best way to avoid disappointment is to understand what data is. I think of it like this:
Data is a record of an event
And that’s it. Often, that will be an imperfect record of an event!
So how does this help you understand data? It helps you understand that data is just the output of some system that’s been captured. In the physical world, it’s only practical to capture very limited aspects of that event. But, by using other available data, you can still make very good decisions without having all possible data to hand.
By way of example, I’ll use an analogy of a photo of animal tracks in the sand. The Decision Maker (DM) is asking the Data Analyst (DA) questions to determine what the situation is. Let’s see how misunderstanding the limits of the data can lead to exasperation from both sides!
DM: So, what can you tell me? Do we know what passed through here?
DA: Sure thing. We can see that two animals passed through here – most definitely a cat and crow.
DM: Only one of each?
DM: That’s amazing, thanks! What colour was the cat and the crow?
DA: We don’t know. The data can’t tell us the colour of the cat. But we do know the crow is black.
DM: That doesn’t make sense. How can the data tell us about one and not the other?
DA: Well, technically it doesn’t tell us about either. But we know from other data that cats can have a combination of colours, but crows are always black.
DM: Ok, how about the health of each animal? Can we tell if both are perfectly healthy?
DA: Well, kind-of. We can tell some limited health information from the tracks. We are fairly certain that each animal is “structurally” okay, no broken legs or limps. But we can’t tell if they are ill or not in some other way. The crow is more difficult to tell. It may just be walking, as they sometimes do, but there’s a small chance it may have an injured wing which caused it to HAVE to walk.
DM: Oh, okay. Some of this information is good, but I was really hoping for more information. Would it help if we took many more photographs of tracks?
DA: No, we’d need video. It’s vastly more expensive though.
DM: Ok, but would that give us data on everything?
DA: Well, yes on some of the health issues, and the colour issues – depending on the video technology, but I won’t go into that. But not on all health issues, or history of where else the animals have been. We’d probably gets lots of data we don’t need.
DM: Don’t tell me that! I though this data would help us a lot more. Now you’re saying no matter how much we spend, we won’t get what we need?
DA: Well, what do you want to achieve with this?
DM: We want to know where and what type of food to put out for these animals.
DA: Oh! Yes we can tell that from the data we have!
That’s sometimes what happens. It’s good to start with the end in mind and realise that your data analysts aren’t being difficult. Data analysts, realise that not everyone has the same level of intuition that you’ve been fortunate enough to develop by dealing with data day in, day out.