You had me at hello part II: identifying patterns in your data

In my previous blog I showed two simple yet powerful tools to visualise data. The first one was getting rid of your distractions. This is always good advice. The second one was guiding your audience by working with contrasts. This one works perfectly if you want your audience to focus on the deviating, salient or unexpected parts of your data. But what about situations where there are no data points that deserve more attention than others?

Figures 1.1 and 1.2: examples of two bar charts

Revealing the pattern

Have a look at Figure 1 above. You see two ordinary bar charts. The exact meaning of these charts is not relevant, so I’m not going to bore you with contextual information. If you compare the two charts, does anything seem out of the ordinary?

To most people, these two charts show nothing more than a bunch of random bars going up and down. Now have a look at Figures 3 and 4. You might be surprised that these charts present the same data as Figures 2.1 and 2.2. In the latter versions patterns emerge that were hidden in the former. Lots of situations are like this: identifying just the lows and the highs or other salient datapoints is just not enough to reveal the underlying story.

Figures 2.1 and 2.2: scatterplots of same data as previous examples

Data heatmaps

As Figures 2.1 and 2.2 showed, the scatterplot is a good way of revealing patterns. But it only works for two rows of numbers at the same time. If you have more than two, a heatmap is likely to help you identify hidden patterns. As you can see in Exhibit 1, heatmaps are extremely intuitive, you don’t have to explain them, whether they are used to locate eye movement, brain activity or rain. Your System 1 information processor understands in a split second.

Exhibit 1: Examples of heatmaps

Heatmaps can work for your metrics as well. Table 1.1 and 1.2 contain the same data about four consumer profiles. They are, however, framed differently by using different colour schemes. Again, what these tables are about is not relevant, try to observe how your brain responds. The table on the left draws attention to the segments with high or low scores. As you will notice it is almost impossible not to pay attention to the high contrast cells.

Table 1.1 and 1.2: examples of two different ways of framing numbers

The table on the right shows the same data. It is now ordered from high to low and framed with a heatmap. This way, the table shows how much these data rows are similar. It’s a strong pattern that you will never see if you are only confronted with the left table. For the quants among us: the average correlation between the data rows is >.90, so you can safely state that the segments are much more similar than they are different. Marketers, however, are often inclined to focus only on the differences and ignore similarities.

A stunning loyalty metric! Or maybe not

Making sense of your data is not only about revealing patterns as shown above, it is also about recognizing them. Have a look at Figure 3. At the horizontal axis you see various car brands. At the vertical axis you see how often (UK) car owners buy the same brand again. It is tempting to look at the brand that stands out (in this case Ford) and wonder why they have such loyal customers.

Figure 3: repeat purchase UK car owners (source: Sharp, 2010)

Fords loyalty metric, however, is very ‘normal’. It is what you would expect, given the Double Jeopardy Law: bigger brands have far more customer who are a bit more loyal. It’s one of the best documented patterns in marketing, but nevertheless massively ignored or overlooked.

Figure 4 shows the same loyalty metric on the vertical axis, but now also the penetration on the horizontal axis. The Double Jeopardy Law, discoverd by Andrew Ehrenberg back in the fifties of the previous century, tells you that penetration and loyalty metrics should be strongly correlated. Clearly, they are: most dots are on or close to the trend line. This example shows that being aware of patterns in your data is extremely valuable for making sensible interpretations.

Figure 4: penetration UK car owners x repeat purchase (source: Sharp, 2010)

Patterns are everywhere

Patterns are everywhere. It’s good to realize, for example, that any data about brands, will show a relationship with brand size. Take brand perception: bigger brands always get higher perception scores on all attributes. It’s a pattern and it’s normal. Don’t get fooled by all those firms that will tell you that their special magic black box metric can predict brand growth. This ‘prediction’ is just showing an expected correlation from which you can’t infer any causation.

Here’s another well-documented pattern that is often overlooked. Say you are selecting your most loyal customers in a certain period and then look at their buying behaviour some time later. You should not be disappointed if they have become lighter buyers over the next period. Like the car example, a well-documented pattern is hiding in your data. It’s called the Law of Buyer Moderation, a.k.a. regression to the mean: if a variable is extreme on its first measurement, it will tend to be closer to the average on its second measurement. It’s one of many patterns that can easily fool you, especially if you are only looking at deviations in your data.

I hope these examples illustrate that it helps to simplify your data visualisations, but that you should be aware of dumbing things down. It is easy to overlook common patterns in your data and be misguided by the obsessive search for stuff that stands out.