After purchasing a product on an online e-commerce store, customers are asked the typical NPS survey question “On a scale of 0-10, how likely is it that you would recommend [product name] to your friends, family or business associates?” for two different products.
Product A Customer Ratings: [6,6,6,6,6,6,6,6,10,10] (Average Customer Rating 6.8)
Product B Customer Ratings: [0,0,0,0,9] (Average Customer Rating 1.8)
Their ratings are listed above. Look at the two data sets. Consider carefully. Which product do you think has a higher NPS (Net Promoter Score).
The answer is their NPS is exactly the same.
Wait that’s unbelievable… the first product has an average customer rating of 6.8/10 while the second has an average customer rating of 1.8/10. How is that possible?
Welcome to the wonderful world of NPS, one of the most misguided scoring calculations ever invented.
Before we start, what is NPS (Net Promoter Score) and how is it calculated?
Take your data set of ratings between 0-10. Divide these ratings into three groups. Anyone with a rating of 7 or 8 is a “Passive.” Throw these scores out. Anyone with a score of 9 or 10 is a “promoter” and anyone with a score of 0-6 is a “detractor.” Take your % of detractors out of the total ratings and subtract it from your % of promoters out of the total ratings. Whatever % comes out of that, negative or positive, turns into your NPS score.
That’s it, you’re done.
So, if 50% of respondents were promoters and 10% were detractors, your NPS is 40. Well sounds straightforward enough… what’s wrong with that?
In the above example, product A has two 10s (promoters) and eight 6s (detractors). So you subtract the % of detractors (80%) from the % of promoters (20%) to get -60% as a total which turns into an NPS of -60.
Product B has one 9 (promoter) and four 0s (detractors). So you subtract the % of detractors (80%) from the % of promoters (20%) to get -60.0% as a total which turns into an NPS of -60.
So product A has a final NPS of -60 and product B has a final NPS of -60. The products have the exact same NPS.
When working with ratings you will almost always have to lose some accuracy to achieve an average “rating.” Often this involves rounding a final number.
Rounding is a form of data grouping. When you are saying 49.469 will be rounded to 49.47 you are grouping data based on a set of parameters or rules. Usually this is done at the very last possible stage to avoid, as much as possible, abstracting out the detail in data too early to prevent inaccuracies.
The sooner in the process you group the data, the more divergent comparable data sets will become in the final calculation because you stack small inconsistencies into bigger and bigger ones, creating a cascading chain of building inaccuracies. That is why most scoring systems round at the last possible step before producing a score.
Additionally, the wider the range of this “grouping” the more inaccurate, and less reflective of the original data set, your final rating becomes.
NPS has one of the widest groupings imaginable. Every rating between 0-6 is considered exactly the same. That’s right, 60% of all possible ratings are grouped and considered “equivalent.” A further, less egregious, grouping happens when 7 is considered equivalent to 8 and 9 is considered equivalent to 10.
NPS does not employ any mechanism of considering the volume or number of data points when creating a calculation. So a data set with one 0 and one 10 is considered exactly equivalent to a data set with one million 0’s and one million 10’s. This is also a weakness shared with traditional averaging. The average of the above two data sets (one with two data points and the other with two million data points) will also come out to be exactly the same.
What is the solution to this? The solution is to produce two scores, one an average that does not take into consideration the number of data points and the other a Wilson score which does weight the number of data points.
If you want a more thorough explanation of Wilson scores here is a great blog on the topic: http://www.evanmiller.org/how-not-to-sort-by-average-rating.html but the jist of it is that a Wilson score takes into consideration the number of data points and predictive accuracy into consideration.
But if using Wilson scores is too complicated for you, the solution is to just use averages.
Argument: The entire point of NPS is that it is about net promoters and net detractors. Net promoters will spread the good word about your product and net detractors will spread the bad word about your product. The net neutrals will say nothing so they have no influence!
Response: This is predicated on the assumption that someone that rates your business or product a 6 will be just as much of a vocal detractor as someone that rates your business or product a 0. So instead of assuming that the amount that someone promotes or detracts your business is correlated linearly with their rating, you are putting people into an arbitrary bracket and assuming that those brackets map to peoples’ behavior non-linearly with no evidence to back that fact.
Argument: Everyone uses NPS because it is easy to calculate! None of this complicated Wilson score mumbo jumbo!
Response: So is an average rating. When in doubt go with an average rating. NPS provides a poor return on complexity by both simultaneously adding complexity and butchering data sets. Calculating an average is both easier than NPS (you just add every rating together and divide by the number of ratings) and does not butcher the data set.
Argument: the true power of NPS is that you are calculating your viral coefficient! The survey question is the critical piece!
Response: There is no reason that you can’t ask this question and calculate your scores from the results in a different way. Just because you ask this specific survey question, you do not have to use NPS to calculate the results.
Additionally, NPS survey question does not add anything that other loyalty-related questions cannot provide. According to a study by Hayes (https://businessoverbroadway.com/wp-content/uploads/2011/01/QP_June_2008_True_Test_Of_Loyalty.pdf), there is no evidence that the “likelihood to recommend” question is a better predictor of business growth than other customer-loyalty questions (e.g., overall satisfaction, likelihood to purchase again) and the “likelihood to recommend” question does not measure anything different from other conventional loyalty-related questions.