The future of statistical thinking 78 december2014 Hilch/iStock/Thinkstock opinion Nigel Marriott looks to data science, infographics and cognitive psychology to expand our definition of what it means to think like a statistician I n 1903, in his book Mankind in the Making, the writer H. G. Wells noted that: “The great body of physical science … [is] only accessible and only thinkable to those who have had a sound training in mathematical analysis, and the time may not be very remote when it will be understood that for complete initiation as an efficient citizen … it is necessary to be able to compute, to think in averages and maxima and minima, as it is now to be able to read and write.”1 The above is a fairly tortuous passage that was shortened and simplified in 1951 by Samuel Wilks when he remarked, during his presidential address to the American Statistical Association, that: “Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.” It is certainly a snappier quote. However, unlike Wells’s original, it does require the reader to understand the phrase “statistical thinking”. Statistical thinking – as Wilks might have thought of it – can be broken down into six core concepts: expectation and variance – which encompass the “averages and maxima and minima” that Wells refers to – plus distribution, probability, risk and correlation.
.
A broader perspective These six concepts are still valid in the twentyfirst century, and I have been teaching them as part of training courses for the past decade. However, 63 years after Wilks’s presidential address, I believe it is time to expand the definition of statistical thinking – and I would like to put forward three new concepts: data, cognition and visualisation (Figure 1). Let us start with data – the lifeblood of the statistician. It is not accounted for explicitly in our current definition of statistical thinking, but I argue that it needs to be. A BBC Horizon documentary declared last year that we are living in the “age of big data”, and although Google chairman Eric Schmidt wrote recently that “big data needs statisticians to make sense of it” (see News, page 3), it is the data scientist that appears to be benefiting most from growing interest in the analysis and application of data. The whole concept of data is changing rapidly. We no longer have just numbers to deal with. Data now consists of free text and images, which present all kinds of challenges for sorting and organisation. Data scientists are perceived as having the skills to deal with this broader class of data, and perhaps that is why interest in the job is growing. According to Google Trends data (see Figure 2), there have been many more searches for Statistical Thinking H G Wells Start of 20th century Statistical Thinking Wilks et al.
.
Late 20th century Statistical Thinking Kahneman et al. 21st century Averages Maximum Minimum Expectation Probability Variance Risk Distribution Correlation Expectation Probability Data Variance Risk Visualisation Distribution Correlation Cognition Figure 1. The core concepts of statistical thinking have evolved and expanded since the start of the 20th century “statistician” compared to “data scientist” over the past decade. However, in recent months, searches for “data scientist” have overtaken those for “statistician”. Searches for the more junior role of “data analyst” have also increased, while those for “statistician” have declined and flattened out. This evidence is by no means conclusive, but it would suggest to me that statisticians risk being left behind and not being seen as integral to this new (and big) data movement. Adding “data” to our definition of statistical thinking will not solve this problem in and of itself, but it will convey an important message: that statisticians, the original data scientists, embrace data in all its forms. Think about it Next comes “cognition” – a subject explored brilliantly by the Nobel Prize winner Daniel Kahneman in his book Thinking, Fast and Slow,2 which examines the ability (or inability) of the human race to think statistically. Published in May 2012, the book’s stated aim is to help people “improve the ability to identify and understand errors of judgement and choice, in others and eventually ourselves”. Its origins lie in Kahneman’s collaborations with Amos Tversky in the 1970s, when they posed the question: “Are people good intuitive statisticians?”. Understanding Statistics is Necessary to Be an Effective Citizen Paper
The answer, then and now, is “no”. Kahneman’s key thesis is that human beings have two types of thinking, which he denotes as System 1 (fast) and System 2 (slow). System 1 is intuitive, effortless, instinctive and based on our experiences, whereas System 2 is rooted in logical reasoning, is energy-intensive and requires memory recall. The book explores how these two types of thinking interact and influence each other. It explains numerous cognitive fallacies that we humans are prone to – which can be blamed largely on the instinctive, effortless response of System 1 – but at the end of each chapter, Kahneman summarises a number of ways we can reduce the risk of making such errors. Statistical thinking is clearly a System 2 way of thinking, which explains why the human race struggles with it. This includes statisticians, who might assume they know enough to avoid such cognitive pitfalls. In a number of experiments, Kahneman and Tversky showed that statisticians were liable to make the same cognitive errors as non-statisticians. Two examples from his book demonstrate this. Try to answer these questions instinctively, without thinking too much about them, before reading on. 1. Steve is very shy and withdrawn, invariably helpful but with little interest in people or in the world of reality. A meek and tidy soul, he has a need for order and structure and a passion for detail. Is Steve more likely to be a librarian or a farmer? 2. Linda is 31 years old, single, outspoken and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice. Rank these three statements in descending order based on the likelihood that they are true: A: Linda is active in the feminist movement B: Linda is a bank teller C: Linda is a bank teller and is active in the feminist movement If you said that Steve was more likely to be a librarian you fell for the representative fallacy. december2014 79 Figure 2.
.
A screengrab of Google Trends data tracking internet search volumes for the job titles ‘statistician’, ‘data scientist’ and ‘data analyst’ over the past 10 years Steve is clearly more representative of the stereotypical librarian, but unless you know the relative frequencies of librarians to farmers, the correct answer is “I don’t know”. In fact, according to Kahneman, farmers outnumber librarians by 20 to 1, so Steve is more likely to be a farmer. Statisticians generally performed better than non-statisticians on the Steve problem, but when it came to the Linda problem, 89% of non-statistical students and 85% of statistical students got it wrong. The correct answer is A-B-C, but most people said A-C-B. Yet it is mathematically impossible for the probability of C to be greater than the probability of A or B separately, since C is merely the intersection of outcomes A and B and thus a subset of either. Kahneman was shocked that the students who took this test still got it wrong even when the error was pointed out to them. It seemed that the representative fallacy, which is characteristic of System 1 thinking, was drowning out their ability to engage System 2. Acknowledging “cognition” as a component of statistical thinking is important as a way of differentiating the statistical mindset – one that prides logical reasoning over instinctive response. Statistical analysis is often used to support data-driven decision-making, but to 80 december2014 achieve that we need to make users of data aware of the cognitive fallacies that they might succumb to, and help them avoid such traps. This brings us neatly on to “visualisation”, which is an important decision-making support. Of course, statisticians are well versed in data visualisations – we are taught histograms, box plots, scatter plots, etc. But learning how to present data correctly does not automatically confer on us the skills required to convey Acknowledging ‘cognition’ as a component of statistical thinking will help differentiate the statistical mindset as one that prides logical reasoning over instinctive response a message in visual form.
.
In my job as a statistical consultant, I was once chided by a client who said that whilst I might be an expert in charts, he needed a graphic, and “that is what graphic designers provide”. The field of infographics is therefore the fusion of charts and design, and it is changing the way people view data. By adding “visualisation” to our definition of statistical thinking we are again sending a message: this time, that statistical thinkers are as good at communicating data as they are at analysing and interpreting it. Statisticians need to keep up with changing times, and we should not be afraid of asking for help if help is needed to achieve this – whether that support comes from graphic designers, neuroscientists or other data professionals. The expertise we need in the twenty-first century is not the sole preserve of statisticians, and we should recognise this. But if we succeed in adding “data”, “cognition” and “visualisation” to our understanding of what statistical thinking is and should be, we will boast a powerful and unique combination of skills. Reference 1. Wells, H.G. (1903) Mankind in the Making. London: Chapman & Hall. 2. Kahneman, D. (2012) Thinking, Fast and Slow. London: Penguin. Nigel Marriott is the founder of Marriott Statistical Consulting Ltd. This article is based on a talk given at the 2014 Royal Statistical Society International Conference in Sheffield, UK … Understanding Statistics is Necessary to Be an Effective Citizen Paper