03/2/14

Reading: A Canadian tragedy… or not?


World Reading Map
The map above might show the making of a serious tragedy for Western and especially Canadian culture. It indicates in colour which nations read the most. Yellow is the second lowest group. Canada is coloured yellow.

TV zombiesIn this survey, Canada ranks 10th – from the bottom! Twenty countries above us have populations which, on the average, read more per week than we do. That surprises and shocks me. And it disappoints me no end.

I’m not only a voracious reader, I’m passionate about books, language, reading and writing, and have been on the library board for 20 years actively helping it grow and develop. Is it a futile task?

I don’t believe so. In fact, I’ve seen the library grow more and more into a vital community resource in the past two decades. It has more users, more books and more reads than ever. That flies in the face of what the map suggests.

The map showed up on Facebook via Gizmodo, The stats come from the NOP World Culture Score (TM) Index (press release here). They’re scary – but are they accurate? They’re certainly not recent: the data were collected between December 2004 and February 2005.

Here are the 30 countries, ranked by the number of hours people there read every week:

  1. India — 10 hours, 42 minutes
  2. Thailand — 9:24
  3. China — 8:00
  4. Philippines — 7:36
  5. Egypt — 7:30
  6. Czech Republic — 7:24
  7. Russia — 7:06
  8. Sweden — 6:54
  9. France — 6:54
  10. Hungary — 6:48
  11. Saudi Arabia — 6:48
  12. Hong Kong — 6:42
  13. Poland — 6:30
  14. Venezuela — 6:24
  15. South Africa — 6:18
  16. Australia — 6:18
  17. Indonesia — 6:00
  18. Argentina — 5:54
  19. Turkey — 5:54
  20. Spain — 5:48
  21. Canada — 5:48
  22. Germany — 5:42
  23. USA — 5:42
  24. Italy — 5:36
  25. Mexico — 5:30
  26. U.K. — 5:18
  27. Brazil — 5:12
  28. Taiwan — 5:00
  29. Japan — 4:06
  30. Korea — 3:06

Canada is listed well below the global average of 6.5 hours a week. Five-point-four-eight hours translates into a mere 49 minutes a day, on average. Are we losing our minds to TV?

Continue reading

09/24/12

Are internet polls valid?


Internet pollHow valid are internet polls? Are they credible for making serious or significant decisions, or merely as general – even vague – indicators of intent? Are they equivalent to paper (and phone) surveys?

No. At least that’s what many experts say. Yes, they can be cost-effective, and good tools to engage the community. But like online petitions, they seldom have sufficient controls that restrict access to the relevant respondents. Anyone with a basic knowledge of how the internet works can easily bypass the limited security and vote numerous times. Often all it takes to get another vote is deleting cookies in your browser tools. More sophisticated users can create voting bots that automate the process.

Time poll hackedYou can read many articles on how to easily hack polls and cheat them. Some poll hacking is actually quite entertaining and imaginative, like the hack of the 2009 Time poll on the most influential people. The point is that polls are vulnerable to a variety of techniques. As one programmer noted about the Time poll:

“I took a look at the process of voting with a very basic set of tools on Firefox: Firebug and LiveHTTPHeaders. What I found is that when you submit the rating, it calls another page and passes a key, the rating, and the poll information through the URL to the page, like so:

http://www.timepolls.com/contentpolls/Vote.do?key=eba3a55e955bc93ade4fc820649cde04&rating=9&id=1857552&pollName=poy2008

Theoretically, then, you could hit this page as many times as you wanted with any rating you wanted, and drive up a candidates’ score. Though one would expect that Time would have figured that anyone could game the system, it’s easy for a programmer to forget that what they don’t intend for public viewing may still be visible, and that they always need to check to ensure that the data they expect is the data they are getting.”

Generic online polls are easy to create and many are free – this makes them attractive to businesses, media and political groups that don’t have the resources to do phone or door-to-door surveys. How many of these instant polls are actually mining participant data can’t be determined, but you have to expect the companies to get some return for a “free” service. Some media clearly use polls not as a count of anything specific, but rather as a measure about how engaged people are on an issue – and how much attention they are paying to that particular media’s coverage of it.

As one Australian study concluded,

“…online polls cannot be considered as an alternative to using paper based surveys. The independent sample t-tests results obtained for the questions administered using a paper based survey and those through an online poll showed that in the majority of cases that there was significant difference between the means. The implication is that online polls cannot be used to survey a cohort of people replacing the more costly paper based survey.

Online polls and surveys are generally open to anyone with an internet connection. Similar to clicking the Facebook “like” button, most online polls simply count clicks, but don’t qualify them by demographic – gender, region, sex, age, income or anything else that might matter. While some may believe 12 and 13-year-olds should be able to vote for any issue, they are not really old enough to appreciate the many facets of any political or social issue. But how do you tell if a vote was cast by a child or an adult?

Unless you have qualifying questions that ask personal information to identify the participant as belonging to the target demographic you need, you can’t distinguish between valid and invalid votes. That makes them all invalid.

Everyone on Facebook knows that the count of “likes” is irrelevant because Facebook lacks a corresponding “dislike” button to provide balance. Without that, the number of likes or followers has to be measured against Facebook’s almost one billion subscribers. Having 1,000, even 100,000 “likes” is a small percentage of the total possible. But even with millions, you have no way of qualifying those “likes” by any meaningful categorization. Yes, Facebook can do it, but they’re not giving the important data to users free. Besides, when it comes down to it, “likes” may make the user feel better and more popular, but they don’t add up to much else than self-importance.

It used to be the count of page hits that people boasted about. That quickly ended when website owners started putting “counters” on pages that faked the numbers, or started with high numbers. To get a real picture of website use today you need sophisticated tools like Google Analytics that identify time spent per page, whether the page was visited by a search bot or a human, whether the user went to other linked pages within that site, the search terms used to land on that page, etc. Online surveys without that sort of statistical analysis are much like the old page count numbers of the 1990s.

Some online petitions and polls even allow the same person to digitally “sign” or vote more than once. Some petitions allow participants to be “anonymous” to others (which clearly defeats the purpose of a petition as a tool of democracy). Again, that opens questions about validity and credibility. Anonymous online comments or petition signatures have no credibility in the democratic process.

Because these petitions invite comments, it’s not uncommon for people to use them as sounding boards for comment and griping, rather than for their intended purpose of gathering support for a particular position. Bitching about the state of government may be stress-relieving, but it is not relevant to the petition and dilutes the intended message.

Any online petition has to be carefully combed for duplication and repetition of names. Even once these are winnowed out, how can anyone determine the age or location of the signatory unless that information is required when signing and provided as part of the presentation? How do the presenters insure the remaining names are valid in respect to the subject of the petition? This is one reason why paper petitions still have considerably more validity than online ones.

Crazy pollsI have voted in online polls and surveys about American politics and presidential races. I’m a Canadian so my vote, my choices should be meaningless, yet there were no qualifying questions posed to restrict access to Americans of voting age. What if the Chinese government took a serious interest in the American presidential race and used US online polls to sway the results towards Chinese goals? Why not? If a poll by one of the two parties asked whether the US Army should be disbanded, wouldn’t it be in the national interests of, say, Iran, China, North Korea, Russia or Syria to push the poll numbers up towards yes and get that onto the candidate’s platform?

Could US policy be shaped by such polls? Not yet. At present there’s a good level of skepticism about online polls among politicians and their strategists. Making a claim that “70 percent of Americans want to disband the army” based on an internet poll would be not only incorrect, but stupid.

Similarly, if we run a poll asking if school should be two hours shorter and we get 12,000 yes votes, and 3,000 no, should school boards seriously consider reducing school hours? What if you found out 11,000 of those yes votes were cast by students under the age of 18? Would that affect how the poll was perceived by educators and administrators? Of course. Qualifying data is always necessary to validate the results.

Have an opinion on something? Anything? There are

On Sodahead, a popular opinion site, here’s the latest series of polls you can vote on, taken from the front page:

Do Father-Daughter Dances Promote Gender Discrimination?
Arnold Schwarzenegger Releases Book Trailer: Are You Interested in Reading His Memoir?
Will J.K. Rowling Find Success Beyond ‘Harry Potter’?
New Studies Cite Stronger Link Between Soda and Obesity: Do You Drink Soda?
KFC Closes Restaurants in Pakistan Amid Protests: Should U.S. Retailers Get Out of the Middle East?
Is Vogue Featuring Domestic Violence On Its Cover?
Which Show Are You Rooting for to Win the Emmy for Best Comedy Series?
Are Celebrity Video Games Awesome or Annoying?
Which News Anchor is Least Likely to Lie to Viewers
What were you most excited to leave behind after high school?
Do You Multitask at the Movie Theater?
Does Kanye West Have a Sex Tape?
Are These GPS Shoes Wonderful or Weird?

Perhaps it’s just me, but whenever I visit this site, I keep asking myself “Who cares?” Were these questions created by bored 15-year-olds? Any number of irrelevant, pointless, puerile polls are available online to people who want to express an opinion, but face it: the results aren’t going anywhere because NO ONE CARES about the results. They’re just there to make you feel engaged, let off steam, and think you’ve contributed to something.

For any opinion poll to be valid, it needs to meet certain crucial scientific criteria, including sample size. Most online polls don’t meet any serious selection criteria at all, which means they’re simply for entertainment, like horoscopes.

What is a valid sample size that gives a result meaning? One percent? Ten? Twenty five? What is the effective sample size for, say, Collingwood, with a population of 20,000?

Let me quote at length from an answer on how polls have to be conducted and what sort of sample size is relevant:

You have a two part question – but you didn’t realize it. The question, which you asked, is: “What should my sample size be for this test?”

If you go to the Sample Size Calculator website (www.berrie.dds.nl/calcss.htm), you can find this:

The parameters you are setting are:
1) Population – the number of people in the world who will be seeing your website. Let’s assume that your population is “everyone in the world.” So, if we use a very large number, say, 1,000,000, we will calculate the maximum sample size needed.
2) Confidence – this is how sure you are going to be that the results of your sample reflect the true population. The higher the number, the larger the sample size. This is the “certainty” of the results. Customarily for most marketing work, 95% confidence is ample. The default in the website is set at 0.95.
3) Margin – This is how much error you are willing to allow. If you allow 5% error, that means that in a sample size of 100, if the results are 50 clicks, the true number of clicks could be between 45 and 55 clicks. This is the “precision” of your test. The small this number, the larger the sample size.
4) Probability – this is the value of the result you expect to get. For instance, if you expect to get 50 clicks out of 100 views, this value is set at 0.5. Of course, most of us don’t know this value. But the good news is that setting it at 0.5 yields the largest sample size. If the number of clicks is 20 or 80, the confidence increases for the same margin or the margin decreases for the same confidence. And this is a good thing.

Sample size, of course, determines cost of the test. In your case, this is time. If we use the parameters of a population of 1,000,000 with 95% confidence for 5% margin of error with a probability of 50%, then the sample size is 385. For a 1% margin, it’s 9517. You are at about 1.45% now. That means that you are within 65 of the true population result. Additionally, assuming the number of views are pretty close, this means that if the results of your test can tell the difference between the two websites as long at the results are different by more than 65.

The second part of the question – that you didn’t ask – is how do I determine if the two results are really different. For this, you do another test – a Chi-squared test. You have a hypothesis that the two results are the same versus the alternative that they are different. For the test, we look at the observed values versus “expected values.” Expected values are what we’d get is we added the total clicks for both tests and divided by total views and then multiplied by the views for each:

3760/9060 = 0.415
Expected values
4550 * .415 = 1888
4510 * .415 = 1872

Divide the difference of the expected values squared by the expected value and add the two values:

(2010 – 1888)^2/1888) = 7.883
(1750 – 1872)^2/1872) = 7.951

Sum is 15.834. Using a Chi-square table like at: http://www.richland.edu/james/lecture/m170/tbl-chi.html

If we set our confidence at 95%, we use 1 – .095 = 0.05. Our degrees of freedom are calculated by subtracting 1 from the number of proportions in our test: 2-1=1. So, for 95% confidence, we test the value of 15.834 against 3.841. Since 15.834 is larger, we reject our hypothesis that the two results are the same and accept the alternative hypothesis that they are different – with a much greater than 95% confidence.

So, as Nelson (nelsonm) stated in about 20% of the space, you’re done. Looks like version A is the best with a greater than 95% confidence.

Using the sample size calculator mentioned above, here’s what I calculate for a reasonably accurate survey for Collingwood. Based on a population of 20,000, a confidence level of .95, margin of .05, and probability of .50, the minimum sample would be 378 to get a reasonably accurate assessment.

That’s a fairly small number. But how can one determine the numbers to be punched into the various parts of the equation? What’s the confidence level, margin of error? Change the margin of error to 0.01 – a mere 1% error of margin – then you need a sample size of 6,491. Which number do you change to account for duplicates, outside (irrelevant) votes or votes made by children?

And let’s not be fooled: online polls can be and have been the target of special interest groups who want to express their own agenda. Election polls are particularly vulnerable to this sort of nefarious activity.

Journalists have to be particularly suspicious of online polls. The National Council on Public Polls (NCPP) provides 20 questions a journalist should ask about poll results. Among these is:

How were those people chosen?
The key reason that some polls reflect public opinion accurately and other polls are unscientific junk is how people were chosen to be interviewed. In scientific polls, the pollster uses a specific statistical method for picking respondents. In unscientific polls, the person picks himself to participate.

In other words, self-participation is unscientific. A bit further down the page, it notes,

But many Internet polls are simply the latest variation on the pseudo-polls that have existed for many years. Whether the effort is a click-on Web survey, a dial-in poll or a mail-in survey, the results should be ignored and not reported. All these pseudo-polls suffer from the same problem: the respondents are self-selected. The individuals choose themselves to take part in the poll – there is no pollster choosing the respondents to be interviewed.

Governments cannot govern by poll. That’s not leadership. But attempting to govern by internet poll is not merely foolish but potentially dangerous. There is little if any way to determine the source of the votes. You might as well govern by magic ball or coin toss.

So when I read a statement like, “A community poll has shown that 7 out of 10 residents support the one community centre concept” I have to ask, who did the poll, where, when and how was it conducted? That statement turns out to be based on the results of an Enterprise-Bulletin online poll, which had approximately 200 results – 200 unqualified, unscientific results. As pointed out in the quote above, a sample size for Collingwood to get a reasonable assessment of public opinion would be at least 378 QUALIFIED votes. Qualified means a resident or taxpayer, of majority age, who understands the question being posed.

Ipsos-Mori research
Who conducts the poll and who provides the results is also important. Voters on the left don’t trust polls produced by voters on the right and vice versa. Some media are trusted to be objective, others – Fox and Sun News, for example – will always be suspected of having a bias towards their particular political slant. And the majority doesn’t trust the government to accurately and objectively present figures.

My final point has to do with the questions themselves. Asking “Do you like ice cream?” is asking for a general personal opinion and really doesn’t need more choices than “Yes, No, Sometimes.” A more specific question would be, “Do you like butterscotch ice cream?” Ice cream manufacturers are not going to change their business plans based on these rather vague questions, however. They might pay more attention to a question asking participants to select from a list of flavours not currently made, but one which they would like to see available.

Asking “Should we build a new town hall?” is an iceberg question: it hides a larger mass of questions below it: “What will it cost?”, “Will it raise my taxes?”, “Who will benefit?”, “Why do we need one?”, “Can the old town hall be refurbished?”, “when will it be built?,” “where will it be built?”and so on. Participants need answers to all those hidden questions before they can properly answer the seemingly simple question posed about building a new town hall. Otherwise, the answers on that poll are essentially meaningless.

A more reliable question might be, “Should we build a new town hall on the western side of town away from other municipal services, if it will take five years, disrupt some municipal services during construction, cause intermittent road closures, raise your taxes by 10% a year, have it located at the edge of town, result in hiring more staff, and incur greater operating costs when it opens, despite staff recommendations that we just refurbish the old one?”

Even that doesn’t include all of the factors necessary in the decision making process: what to do with the old town hall, should local contractors get preference in the tendering process, will there be local jobs created, can we get support funding from other governments, is the building “green” or LEEDS certified, do we have to buy or expropriate property, is the site currently zoned for it?

Most of all it doesn’t answer the biggest question: why do we need one now?

Perhaps, if you can vouchsafe that the participants in your poll have all paid close attention to the debate about building a new town hall, that they have attended council meetings to watch the debates, have read all the staff reports, have listened to the treasurer expound on the financial implications, have read and watched the local media to gain insight and understand the differences of opinion – they might be able to answer “Should we build a new town hall?” without further refinement. Good luck finding enough people in that category to fit the necessary sample size to validate the results.

Even if you could find such a group, the results can’t simply be reported in terms of yes and no votes. Demographic breakdown of the results is important, too. Politicians should be told the geographic location of the participants (how many west-end participants voted no compared to the east-end, for example), their age groups (are working parents more in favour than retirees?), gender, whether they live in town full-time or part-time, whether they own a vehicle or use public transit to get around town – all become parts of the decision-making milieu.

Most internet polls are merely for entertainment purposes. They harmlessly allow us to believe we are being engaged and are participating in the process, and they make pollsters happy to receive the attention. They are, however, not appropriate tools for making political or social decisions unless they are backed by rigid, scientific and statistical constraints.