Living by the evidence

Author Professor Harvey Goldstein, School of Education, University of Bristol

In public discourse it has become common to claim that a programme or policy is “evidence informed”. Indeed, it is often felt sufficient merely to state that a particular decision is “evidence informed”, rather than describing the nature and quality of the underlying evidence.

The move to base public policy decisions on the best scientific evidence is certainly welcome and has been inspired and developed by initiatives such as the Cochrane Review in medicine and the Campbell Review in the social sciences, which rely upon systematic reviews of research. In this brief article I would like to explore the contemporary scene and how evidence is used or misused.

I will start by looking at what counts as evidence, followed by a discussion of how evidence should be presented, and examples of attempts to moderate ways in which evidence is used in public debate. Finally, I will look at how evidence about school performance has been used, and what lessons we might take from this.

But before considering the nature of evidence, it is worth saying that public policy decisions are, and should be, influenced by considerations beyond the research evidence, such as priorities, feasibility, acceptability, ethics and so forth: all of these will involve judgements, typically subjective ones.

Types, quality and uses of evidence

Evidence that can reasonably be termed “objective” can usefully be divided into two kinds. First, and most importantly, are inferences about causal or predictive relationships, usually across time, relating actions or social and other circumstances to later outcomes.

A second kind of evidence is useful, although secondary, and is concerned with the provenance of evidence: who has provided it, who has sponsored it, and what vested interests might be involved. Thus, there is legitimate concern about the funding of climate change research by oil companies, and for some time medical researchers studying the effects of smoking have refused to accept funding from the tobacco industry.

In addition to provenance, the general quality of research is typically supported by peer reviewing. The best journals will obtain at least two independent judgements on any paper submitted and, while not foolproof, this is perhaps the most satisfactory method that is currently available for weeding out unsatisfactory work.

When evaluating evidence quality, uncertainty should be taken into account and well communicated. A quantitative analysis of any dataset will always reflect uncertainty, due to sampling variability, choice of technique and so forth. Those presenting evidence should do so in such a way as to allow an informed debate about how it can be used. Publicly accountable bodies in particular should be required to be transparent about the uncertainties and alternative explanations that might be involved.

Finally, those people further disseminating evidence – such as journalists and policymakers – need to resist the temptation to “cherry pick” the results they like, and this leads on to the issue of how to ensure that those using evidence do so in a responsible fashion.

Moderating the debate

Sites such as Full Fact do a good job with limited resources in holding evidence providers and media to account, but they typically fail to delve in depth and one consequence is that journalists – with their own limited resources – may simply use and quote the assessments provided by fact-checking sites, rather than seeing such sites as a first step to following up in more detail.

Journalists may look elsewhere, of course, including to the UK Statistics Authority (UKSA), the statutory body overseeing UK national statistics, which broadly does a good job of highlighting the misuse of statistics in public debate. But the UKSA’s resources are also limited and, like Full Fact and others, it does not generally explore issues in depth.

Another organisation that comments, criticises and advises on the use of evidence in public life is the Royal Statistical Society, but it too has limited resources, and must largely rely on voluntary input from members, though it nevertheless does a lot of insightful work, with the real strength that this work is informed by expert opinion.

Impact and access

One concern relates to current UK government policy devoted to the evaluation of university research – the Research Excellence Framework – explicitly encourages researchers to effectively ignore best practice when describing their research “impact”, in favour of promoting their own research as a major driver of policy change or even change in a distal outcome (such as policy alleviation). In general such attempts to infer ‘causality’ are a pretty silly thing to do and ultimately could lead to a severe distortion of research and the ethics surrounding it, as well as forcing a concentration on short-term rather than long-term objectives. In some situations

A second concern is to do with the evolving economics of scientific publishing. The role of commercial publishers of books and journals has also always been important. The most recent development in this field is so-called “open access publishing”, whereby the cost of accessing a research paper – which has traditionally fallen on the reader through access to an academic library or otherwise – is now being shifted to the writer of the paper, who might be expected to pay up to £2,000 to a journal so that the work can be freely downloaded by anybody. I do not have space to go into all the details, but it should be fairly clear that, under this model of publishing, those with the financial resources to pay so-called “article processing costs” are more likely to be those whose research gets read. The social, cultural and scientific implications of this are likely to be extensive.

We have also recently seen the steady growth of “middleperson” organisations who will publicise scientific work to the public for free – but charge the researcher up to £2,000 per paper – or, alternatively, they may offer to distribute a “popular” version provided by the researcher to paying subscribers.

Both of these examples are likely to change the balance of evidence that gets used, yet neither has been discussed in open debate.

How to use evidence sensibly

So, we have evidence. Now, how do we use it? I have spent much of my career arguing about league tables, especially school ones, so I’ll end on this topic. Over last 30 years, some of us have had some success in conveying notions of statistical uncertainty (interval estimates for ranks) and the need to make adjustments for the differences in pupil intake between schools. These constraints have influenced policymakers to the extent that they are reflected in the tables they provide. But they have done little to moderate the enthusiasm of the media, who are generally unwilling to forsake the idea that what matters – or, perhaps, more cynically, what sells newspapers and website subscriptions – is a simple ranking of “best” to “worst” schools, without any concerns about uncertainty or even the need for statistical adjustment for intake.

The problem of school league tables illustrates several important points. It shows how certain kinds of evidence can be harmful if collected and then displayed in public. It shows, as in the case of university research impact assessments, that individual actors can game the system, so changing what it is intended to measure. It shows how a government can claim to be providing useful information, without any real attempt to stimulate a public debate about its usefulness. And it shows how mass media will embrace the most simplistic interpretations of the data without any encouragement to dig deeper.

To be clear: I am not advocating that we drop the idea of publicly accountable systems, rather that we move away from naïve and misleading presentations of evidence, and towards a more rational approach. In other words, league tables – for schools or other institutions – should function as one piece of evidence: as a screening device that may be able to point to concerns that could be followed-up. But any ranking, in itself, is not suitable for use as a diagnostic tool to pass definitive judgement. (See “Rethinking school accountability” for a suggestion on how we might progress in education.)

Final thoughts

So where does this leave us? I have little doubt that, ultimately, real evidence can win out if the issue is serious enough. For example, as we see with climate change evidence, it will be ignored for as long as possible by vested interests and those policymakers who rely upon such vested interests, until its implications really can no longer be ignored. Hopefully this will not be too late for useful action.

The important thing for researchers is not to give up. The research and the publicising of the implications of that research, along with public critiques of evidence abuse or suppression, need to continue. All of this is difficult, but I think there is an ethical imperative to try to do it.

And I hope to be involved in doing just that.

This is a shortened version of a longer paper that can be accessed at:

https://harveygoldstein.co.uk/2019/12/15/the-role-of-evidencein-public-policy-making/