As the final stages of different studies draw near, it becomes necessary to process all the data gathered, in order to extract as much information as possible.

One of many possible definitions for data analysis is describing it as the science – though it could be argued that it is more akin to an art, given how much of it depends on the analyst – of turning an unwieldy amount of data into something that is both small and useful. In the versatility of this description, though, lies one of its greatest obstacles, as “useful” is far too versatile a word to be used as is. The very description begs the question: What is useful?

Clearly, it all depends on the goal, the problem that needs to be solved. No two problems are alike, and as such sometimes a simple average can fulfill this usefulness criteria – say, if the problem being solved is getting a very minimal overview of the data. In most cases that is not enough, as nobody needs a data analyst to calculate an average. As the problem becomes increasingly complex so do the tools required to handle it and, in order to turn this complexity into a “small and useful” unit, more processing is required.

In an ideal scenario, the goal would be clearly defined from the start, and it would be immediately known what kind of analysis is desired. Unfortunately, the world is all but ideal, and thus ironically enough it is exceedingly common that the only way to have a clearly defined goal is to have performed previous analysis on the data.

In a more realistic case scenario, the goal is vague because there is not enough prior information to narrow it down, and so it becomes part of the analysis to determine what questions these data could answer. Usually this involves carrying out initial explorations along with an expert on the field, who can provide meaning and value to the information gathered, and thus help the analyst understand and, in a way, translate this knowledge to the analysis.

Once this very initial process of ensuring that the analysts can understand the data they’ll be working with as thoroughly as possible, then will be when the true analysis begins. With data analysis being an iterative process, the goal of each step is not to answer immediately the problem posed at the start but to acquire more information, narrowing down in the process the questions that these initial results can answer, while being aware of how well this questions are being solved too.

This information will allow better decision-making during the analysis, and the knowledge acquired will slowly (or quickly, in fortunate cases) make the problem to be sorted out. With enough iteration, the problem itself will become progressively more clearly defined, and so the actual solving process can begin – with the added benefit of having acquired a lot of information about the data being studied in this initial phase, which may open interesting new possibilities.

As a whole, the process is similar to attempting to reach a destination through an uncharted land – the path to follow is unknown, and the destination is vague, but each attempt to make it through reveals more of the lay of the land, more landmarks, and new routes, making it easier to plan for the next try.

Of course, this does not mean that the journey is simple. There are many decisions to be taken throughout the way and, while it is relatively straightforward to stop and search for errors when it is obvious that something is not working as intended, it takes much more discipline to look at a positive, or perhaps just promising, result and halt to find what could cause it to be invalid.

Despite the challenges, though, data analysis can be the capstone of a great project if done right. Thousands of hours of work may very well culminate into a compressed nugget of information that contains the solution to the question that sprung all this effort. Small and useful, indeed.


Written by Luis Fernández, Life Length’s Data Analyst