No Data Are Better Than Bad Data

The full name of an article I read today is, “The Fallacy of Online Surveys: No Data Are Better Than Bad Data.” It’s from 2010 and very good. You can find it on the Responsive Management website. It makes some key points about the invalidity of online surveys:

For a study to be unbiased, every member of the population under study must have an equal chance of participating.
When online surveys are accessible to anyone who visits a website, the researcher has no control over sample selection. These self-selected opinion polls result in a sample of people who decide to take the survey — not a sample of scientifically selected respondents who represent the larger population.
Non-response bias in online surveys is complicated by the most egregious form of self-selection. People who respond to a request to complete an online survey are likely to be more interested in or enthusiastic about the topic and therefore more willing to complete the survey, which biases the results.
Unless specific technical steps are taken with the survey to prevent it, people who have a vested interest in survey results can complete an online survey multiple times and urge others to complete the survey in order to influence the results.
Because of the inability to control who has access to online surveys, there is no way to verify who responds to them — who they are, their demographic background, their location, and so on.

I’ve said this all before. The article concludes:

As a result of these problems, obtaining representative, unbiased, scientifically valid results from online surveys is not possible at this time, except in the case of the closed population surveys, such as with employee surveys, described earlier. This is because, from the outset, there is no such thing as a complete and valid sample — some people are systematically excluded, which is the very definition of bias. In addition, there is no control over who completes the survey or how many times they complete the survey. These biases increase in a stepwise manner, starting out with the basic issue of excluding those without Internet access, then non-response bias, then stakeholder bias, then unverified respondents. As each of these becomes an issue, the data become farther and farther removed from being representative of the population as a whole.

There’s also a good slide show on internet surveys here that goes over the basics presented in the article above. A 2008 paper addressed just issue with online surveys: self-selection. The author, Jelke Bethlehem, wrote:

…web surveys are a fast, cheap and attractive means of collecting large amounts of data. Not surprisingly, many survey organisations have implemented such surveys. However, the question is whether a web survey is also attractive from a quality point of view, because there are methodological problems. These problems are caused by using the Internet as a selection instrument for respondents.
This paper shows that the quality of web surveys may be seriously affected by these problems, making it difficult, if not impossible to make proper inference with respect to the target population of the survey. The two main causes of problems are under-coverage and self-selection.

The author concludes:

It was shown that self-selection can cause estimates of population characteristics to be biased. This seems to be similar to the effect of nonresponse in traditional probability sampling based surveys. However, it was shown that the bias in selfselection surveys can be substantially larger. Depending on the response rate in a web survey, the bias can in a worst case situation even be more than 13 times as large.

In other words: most online surveys are bunk. You might also recall I wrote about online surveys in past posts. I won’t repeat what I said then, but here are the links to those posts:

Ian's bookshelf: currently-reading

Ian's bookshelf: read

Search Scripturient

Author: Ian Chadwick

Leave a Reply Cancel reply

Search Scripturient