Philip Kotler is by many regarded as the ‘founding father’ of marketing. The key concept in the Kotlerian school is differentiation. Although the empirical evidence that differentiation is of any importance for brands or products is very thin, Kotler’s key concept still dominates most business agenda’s. So, marketers ask themselves what distinguishes my brand from other brands? How is this new product different from the other products? What particular group of consumers should I target?
A tool for hunting differences
Because of the Kotlerian legacy, marketers and researchers spend much of their efforts on spotting differences in data. In this pursuit, they often rely on a decision-making tool borrowed from the sciences called ‘statistical significance testing’. It was developed a long time ago to help the interpretation of experimental effects. Knowing that small samples bring in the chance factor, one would like to separate the ‘real’ effects from those that are likely to be caused by chance and chance only.
A ‘statistical significant effect’, however, doesn’t mean the effect is ‘real’. It means that the chance of finding an experimental effect just being coincidence is small. Small enough to say: “well, it’s likely to be a real effect instead of coincidence, so let’s report it.” And so, we report statistical significant effects and ignore effects that are not, academics and marketing researchers alike.
The social sciences, for which this tool was originally invented, usually deal with small samples and a handful of ideas (a.k.a. ‘hypotheses’) to be tested. Marketers usually deal with large samples and no ideas. Hence, just to be sure nothing is missed, every single number tabulated is tested for significance.
You probably already get a hunch there is a mismatch between the way this concept of significance testing is applied in the academic context versus the commercial context. But to grasp to what degree using this convention is misguiding your decision making, we must dig a little deeper. There are at least three serious issues with significance testing, which I explain below.
Issue 1: It’s raining false positives
When you use a statistical test, there are four possible outcomes, as depicted in Figure 1. If your test result from your sample corresponds with the real world, you are in good shape. But there are two outcomes in which you find a mismatch between your sample and the real world. First, you could overlook a real difference; this is called a false negative. This happens only with very small samples, which we usually don’t have. So, this will rarely happen.
Figure 1: Correspondence between sample observations and real world
The other mismatch is that your test says there is a difference but in the real world there is none; this is called a false positive. With the conventional 95%-confidence criterion this happens in 1 out of every 20 (or 5%) tests.
At first sight, 1 false positive out of 20 tests might seem like a small number. But as the usual approach is submitting all tabulated combinations of percentages to a significance test. That way, the number of false positives adds up quickly, yielding roughly 50 to 500 false positives in a typical marketing research project. To be clear: these apparent differences are sheer noise. That’s quite serious by itself, but it gets worse.
Issue 2: It’s raining trivialities
Significance testing doesn’t take the size of the effect into account at all. Obviously, whether an observed difference is big or small does matter. Figure 2 helps understanding why a significance test lacks this important information: horizontally you see whether the effect is small or not, vertically whether it is statistically significant or not.
Many of your test results will be true positives. These are differences that are flagged as statistically significant which are—statistically speaking—truly different. Most of these true positives, however, will be small in magnitude (upper left quadrant in Figure 2). Small effects have no value in a commercial sense. Usually there are only a few medium or large effects. These might be relevant but all too often they just state the bloody obvious, e.g. males drink more beer than females, or older people are more interested in buying a new car compared to younger people.
From a statistical perspective, true yet small positives are signal. From a business perspective, they are noise. ‘True’ but trivial differences outnumber the false positives by far. So, if statistical significance is the only criterion you use, you will report a fair share of trivial results.
Figure 2: Effects in typical MR study
Note: the figure above illustrates the likely distribution of various effects in a typical marketing research study. An observed difference is either significant or not (vertical axis) and has a smaller or bigger effect size (horizontal axis). Only medium to large effects (might) have commercial implications. False positives will be found in the upper left quadrant (5% of all tests).
Issue 3: You’re ignoring similarities
Reporting false positives (statistical noise) and irrelevant differences (commercial noise) is quite troublesome by itself, but the worst is yet to come.
Relying on this decision-making tool makes marketers and researchers so obsessed with identifying differences, that they simply have no eye for similarities in the data anymore. And usually, similarities, as illustrated in the lower left quadrant in Figure 2, outnumber differences by far.
Similarities are extremely relevant as they make the world less complicated. They tell you where you don’t have to worry about differentiating your product, communication or targeting.
It doesn’t do what it should
So, to summarize, somewhere along the way we have adopted a tool originally developed to improve making decisions. But it turns out it does exactly the opposite: it stimulates reporting effects that aren’t found in the real world or are just irrelevant. As humans are very habitual species and the significance test is readily available in most tools, I’m afraid it is likely to stick around.
But you know better now… the very least you can do is look at the size of the effect. If you weed out the bigger effects from the smaller ones, your perception of the world becomes less complicated and more realistic. Now that’s what I call a win-win situation.