Guest Post by Rob Brogle, Global Productivity Solutions. Originally posted on iSixSigma October 24, 2013
I. Introduction
Ever since the Ritz-Carlton Hotel Company won the Malcolm Baldrige National Quality Award for the second time in 1999, companies across many different industries began trying to follow their lead in attempting to achieve the same level of outstanding customer satisfaction. This was a good thing, of course, as CEOs and executives began incorporating customer satisfaction into their company goals and communicating frequently to their managers and employees about the importance of making customers happy.
When Six Sigma and other metrics-based systems began to spread through these companies, it became apparent that customer satisfaction needed to be measured using the same type of data-driven rigor that other performance metrics (processing time, defect levels, financials, etc.) were utilizing. After all, if customer satisfaction was to be put at the forefront of a company’s improvement efforts, then a sound means of measuring this quality would be required.
Enter the customer satisfaction survey. What better way to measure customer satisfaction than asking the customers themselves? Companies immediately jumped on the survey bandwagon—using mail surveys, automated phone surveys, e-mail, web-based, and many other platforms. Point systems were used (ratings on a 1-10 scale, 1-5 scale, etc.) that produced numerical data and allowed for a host of quantitative analyses. Use of “Net Promoter Score” (NPS) to gauge customer loyalty became a goldmine for consultants selling these NPS services. Customer satisfaction could be broken down by business unit, department, and individual employee. Satisfaction levels could be monitored over time to determine upward or downward trends; mathematical comparisons could be made between customer segments, product or service types. This was a CEO’s dream—and it seemed there was no limit to the customer-produced information that could help transform a company into the “Ritz-Carlton” of its industry.
In reality, there was no limit to the misunderstanding, abuse, wrong interpretations, wasted resources, poor management, and employee dissatisfaction that would result from these surveys. Although there were some companies that were savvy enough to understand and properly interpret their survey results, the majority of companies did not. And this remains the case today.
What could possibly go wrong with the use of customer satisfaction surveys? After all, surveys are pretty straightforward tools that have likely been used since the times of the Egyptians (Pharaoh satisfaction levels with pyramid quality, etc.). The reality is that survey data has a lot of potential issues and limitations that makes it different from other “hard” data that companies utilize. It is critical to recognize these issues when interpreting survey results—otherwise what seemed like a great source of information can cause a company to inadvertently do many bad things. Understanding and avoiding these pitfalls will be the focus of this commentary.
II. Survey Biases and Limitations
Customer satisfaction surveys are everywhere; in fact, we tend to be bombarded with e-mail or online survey offers from companies who want to know our opinions about their products, services, etc. In the web-based world of today, results from these electronic surveys can be immediately stored in databases and analyzed in a thousand different ways. However, in virtually all cases the results are wrought with limitations and flaws. We will now discuss some of the most common survey problems which include various types of biases, variations in customer interpretations of scales, and lack of statistical significance. These are the issues that must be taken into account if sound conclusions are to be drawn from survey results.
A. Non-response Bias
Have you ever called up your credit card company or bank and were asked to stay on the line after your call is complete in order to take a customer satisfaction survey? How many times do you actually stay on the line to take that survey? If you’re like the vast majority of people, you hang up as soon as the call is complete and get on with your life. But what if the service that you got on that phone call was terrible, the agent was rude, and you were very frustrated and angry at the end of the call. Then would you stay on the line for the survey? Chances are certainly higher that you would. And that is a perfect example of the non-response bias at work.
Although surveys are typically offered to a random sample of customers, the recipient’s decision whether or not to respond to the survey is not random. Once a survey response rate dips below 80% or so, the inherent non-response bias will begin to affect the results. The lower the response rate, the greater the non-response bias. The reason for this is fairly obvious: the group of people who choose to answer a survey is not necessarily representative of the customer population as a whole. The survey responders are more motivated to take the time to answer the survey than the non-responders; therefore, this group tends to contain a higher proportion of people who have had either very good, or more often, very bad experiences. Changes in response rates will have a significant effect on the survey results. Typically, lower response rates will produce more negative results, even if there is no actual change in the satisfaction level of the population.
B. Survey Methodology Bias
The manner in which a customer satisfaction survey is administered can also have a strong impact on the results. Surveys that are administered in person or by phone tend to result in higher scores than identical surveys distributed by e-mail, snail mail, or on the web. This is due to people’s natural social tendency to be more positive when there is another person directly receiving feedback (even if the recipient is an independent surveyor). Most of us don’t like to give people direct criticism, so we tend to go easy on them (or the company they represent) when speaking with them in person or by phone. E-mail or mail surveys have no direct human recipient and therefore the survey taker often feels more freedom to give negative feedback—they’re much more likely to let the criticisms fly.
Also, the manner in which a question is asked can have a significant impact in the results. Small changes in wording can affect the apparent tone of a question, which in turn can impact the responses and the overall results. For example, asking “How successful were we at fulfilling your service needs” may produce a different result than “How would you rate our service?” Even the process by which a survey is presented to the recipient can alter the results—surveys that are offered as a means of improving products or services to the customer by a “caring” company will yield different outcomes than surveys administered solely as data collection exercises or surveys given out with no explanation at all.
C. Regional Biases
Another well-known source of bias that exists within many survey results is regional bias. People from different geographical regions, states, countries, urban vs. suburban or rural locations, etc. tend to show systematic differences in their interpretations of point scales and their tendencies to give higher or lower scores. Corporations that have business units across diverse locations have historically misinterpreted their survey results this way. They will assume that a lower score from one business unit indicates lesser performance, when in fact that score may simply reflect a regional bias compared to the locations of other business units.
D. Variation in Customer Interpretation and Repeatability of the Rating Scale
Imagine that your job is to measure the length of each identical widget that your company produces to make sure that the quality and consistency of your product is satisfactory. But instead of having a single calibrated ruler with which to make all measurements, you must make each measurement with a different ruler. No problem if all the rulers are identical, but now you notice that each ruler has its own calibration. What measures as one inch for one ruler measures 1¼ inches for another ruler, ¾ of an inch for a third ruler, etc. How well could you evaluate the consistency of the widget lengths with that kind of measurement system if you need to determine lengths to the nearest 1/16 of an inch? Welcome to the world of customer satisfaction surveys.
Unlike the scale of a ruler or other instrument which remains constant for all measurements (assuming its calibration remains intact), the interpretation of a survey rating scale varies for each responder. In other words, each person who completes the survey has his or her own personal “calibration” for the scale. Some people tend to be more positive in their assessments; other people are inherently more negative. On a scale of 1-10, the same level of satisfaction might solicit a 10 from one person but only a 7 or 8 from another.
In addition, most surveys exhibit poor repeatability. When survey recipients are given the exact same survey questions multiple times, there are often differences in their responses. Surveys rarely pass a basic gauge R&R (repeatability and reproducibility) assessment. Because of these factors, surveys should be considered very noisy (and biased) measurement systems, and therefore their results cannot be interpreted with the same precision and discernment as data that is produced by a physical measurement gauge.
E. Statistical Significance
Surveys are, by their very nature, a statistical undertaking and therefore it is essential to take the statistical sampling error into account when interpreting survey data. As we know from our Six Sigma backgrounds, sample size is part of the calculation for this sampling error: if a survey result shows a 50% satisfaction rating, does that represent 2 positive responses out of 4 surveys or 500 positives out of 1000 surveys? Clearly our margin of error will be much different for those two cases.
There are undoubtedly thousands of case studies of companies who completely fail to take margin of error into account when interpreting survey results. A well-known financial institution would routinely punish or reward their call center personnel based on monthly survey results—a 2% drop in customer satisfaction would solicit calls from execs to their managers demanding to know why the performance level of their call center was decreasing. Never mind that the results were calculated from 40 survey results with a corresponding margin of error of ±13%, making the 2% drop completely statistically meaningless.
Another well-known optical company set up quarterly employee performance bonuses based on individual customer satisfaction scores. By achieving an average score between 4.5 and 4.6 (based on a 1-5 scale), an employee would get a minimum bonus, if they achieved an average score between 4.6 and 4.7 they would get an additional bonus, and if their average score was above 4.7 they would attain their maximum possible bonus. As it turned out, each employee’s score was calculated from the average of less than 15 surveys—the margin of error for those average scores was ±0.5. Therefore, all of the employees had average scores within this margin of error and thus there was no distinguishability at all between any of the employees. Differences of 0.1 points were purely statistical “noise” with no basis in actual performance levels.
Essentially, when companies fail to take margin of error into account, they wind up making decisions, rewarding or punishing people, taking actions, etc. based purely on random chance. And as our friend W. Edwards Deming told us 50 years ago, one of the fastest ways to completely de-motivate people and create an intolerable work environment is to evaluate people based on things that are out of their control.
III. Proper Use of Surveys
So what can be done? Is there a way to extract useful information about surveys without misusing them? Or should we abandon the idea of using customer satisfaction surveys as a means of measuring our performance?
Certainly, it is better not to use surveys at all then to misuse and misinterpret them. The harm that can be done when biases and margin of error are not understood is worse than the “benefit” of having misleading information. However, if the information from surveys can be properly understood and interpreted within their limitations then surveys can, in fact, help to guide companies in making their customers happy. Here are some ways that can be accomplished.
A. Use Surveys to Determine the Drivers of Customer Satisfaction, Then Measure Those Drivers Instead
Customers generally aren’t pleased or displeased with companies by chance—there are key drivers that influence their level of satisfaction. Use surveys to determine what those key drivers are and then put performance metrics on those drivers, not on the survey results themselves. Ask customers for the reasons why they are satisfied or dissatisfied, then affinitize those responses and put them on a Pareto chart. This information will be much more valuable than a satisfaction score, as it will identify root causes of customer happiness or unhappiness on which you can then develop measurements and metrics.
For example, if you can establish that responsiveness is a key driver in customer satisfaction then start measuring the time between when a customer contacts your company and when your company gives a response. That is a “hard” measurement—much more reliable than a satisfaction score. The more that a company focuses on improving the metrics that are important to the customer (the customer CTQs), the more that company will improve real customer satisfaction (which is not always reflected in biased and small-sample survey results).
B. Improve Your Response Rate
If you want your survey results to reflect the general customer population (and not a biased subset of customers) then you must have a high response rate to minimize the non-response bias. Again, the goal should be at least 80% response rate. One way to achieve this is to send out fewer surveys but send them to a targeted group that you’ve reached out to ahead of time. Incentives for completing the survey along with reminder follow-ups can help increase the response rate significantly.
Also, making the surveys short, fast, and painless to complete can go a long way in improving response rates. As tempting as it may be to ask numerous and detailed questions to squeeze every ounce of information possible out of the customer, you are likely to have a lot of survey abandonment in those cases once people realize it’s going to take them more than a couple of minutes to complete. You are much better off giving a concise survey that is very quick and easy for the customers to finish. Ask a few key questions and let your customers get on with their lives—they will reward you with a higher response rate.
C. Don’t Try To Make Comparisons Where There Are Biases Present
A lot of companies use customer survey results to try to score and compare their employees, business units, departments, etc. These types of comparisons must be taken with a large block of salt, as there are too many potential biases that can produce erroneous results. Do not try to compare across geographic regions (especially across different countries for international companies), as the geographic bias could cause you to draw the wrong conclusions. If you have a national or international company and wish to sample across your entire customer base, be sure to use stratified random sampling so that your customers are sampled in the same geographic proportion that is representative of your general customer population.
Also, do not compare results from surveys that were administered differently (phone vs. mail, e-mail, etc.) even if the survey questions were identical. The survey methodology can have a significant influence on the results. Be sure that the surveys are all identical, and are administered to the customers using the exact same process.
And, finally, always keep in mind that surveys rarely are capable of passing a basic gauge R&R study. They represent a measurement system that is extremely noisy and flawed, and therefore using survey results to make fine discernments is usually not possible.
D. Always, Always, Always Account for Statistical Significance in Survey Results
This is the root of the majority of survey abuse—where management makes decisions based on random chance rather than on significant results. In these situations our Six Sigma tools can be a huge asset, as it’s critical to educate management on the importance of proper statistical interpretation of survey results (as with any type of data).
Set the strict rule that no survey result can be presented without including the corresponding margin of error (i.e., the 95% confidence intervals). For survey results based on average scores, the margin of error will be roughly ± where s is the standard deviation of the scores and n is the sample size (for sample sizes < 30, the more precise t-distribution formula should be used instead). If the survey results are based on percentages rather than average scores, then the margin of error can be approximated as where p is the resulting overall proportion and again n is the sample size (note that the Clopper-Pearson exact formula should be used if np < 5 or (1-np) < 5). Mandating that margin of error be included with all survey results will help frame the results for management, and will go a long way in getting people to understand the distinction between significant differences and random sampling variation.
Also, be sure to use proper hypothesis testing when making survey result comparisons between groups. All our favorite tools should be utilized: for comparing average or median scores we have the T-tests, ANOVA, or Mood’s Median tests (among others); for results based on percentages or counts we have our proportions tests or chi-squared analysis.
If we are comparing a large number of groups or are looking for trends that may be occurring over time, the data should be placed on the appropriate control chart. Average scores should be displayed on an and R or and S chart, while scores based on percentages should be shown on a P chart. For surveys with large sample sizes, an I and MR chart may be more appropriate (a la Donald Wheeler) to account for variations in the survey process that are not purely statistical (such as biases changing from sample to sample which is very common). Control charts will go a long way in preventing management overreaction to differences or changes that are statistically insignificant.
And finally, make sure that if there are goals or targets being set based on customer satisfaction scores, those target levels must be statistically distinguishable based on margin of error. Otherwise, people get rewarded or punished based purely on chance. In general, it is always better to set goals based on the drivers of customer satisfaction (“hard” metrics) rather than on satisfaction scores themselves, but in any case the goals must be set to be statistically significantly different from the current level of performance.
IV. Conclusion
Customer satisfaction surveys are bad, evil things. Okay, that’s not necessarily true but surveys do have a number of pitfalls that can lead to bad decisions, wasted resources, and unnecessary angst at a company. The key is to understand survey limitations and to not treat survey data as if it were precise numerical information coming from a sound, calibrated measurement device. The best application of customer surveys is to use them to obtain the drivers of customer happiness or unhappiness, then create the corresponding metrics and track those drivers instead of survey scores. Create simple surveys and strive for high response rates to assure that the customer population is being represented appropriately. Do not use surveys to make comparisons where potential biases may lie, and be sure to include margin of error and proper statistical tools in any analysis of results.
Used properly, customer satisfaction surveys can be valuable tools in helping companies understand their strengths and weaknesses, and in helping to point out areas of emphasis and focus in order to make customers happier. Used improperly, and many bad things happen. Make sure your company follows the right path.
[xyz_lbx_default_code]