The Science of Evaluation


Dr. Adam Fletcher agrees with Ray Pawson that the principles of ‘realist’ social science should guide how we do evaluation but argues that we must recognise the importance of ‘realist RCTs’ in doing this.

The Science of Evaluation is the latest book by Ray Pawson, Professor of Social Research Methodology at Leeds University. His previous work has been influential in how we think about evaluation in terms of not just identifying ‘what works’ but also asking: what works, for whom and in what circumstances?

This new book focuses on what evaluators need to do so that their collective efforts add up to a more useful body of knowledge about ‘what works, for whom and in what circumstances’. This is important and should guide how we do research to inform social policy and intervention, although developing this new ‘evaluation science’ is certainly not straightforward.

What sort of science is evaluation science?

Social processes are obviously not like the processes observed in the natural sciences. People can’t be studied like atoms or germs. There are no absolute social truths. Complexity is everywhere we turn. But does ‘social’ equal ‘too messy for science’?  Not necessarily. ‘Realism’ (or ‘critical realism’ as it’s often termed) is a philosophy of social science that helps us to deal with this complexity, without ignoring it, to understand what works best and when.

In basic terms, the ‘realist’ view of the social world is that it can be studied scientifically but we need to recognise our role in constructing and interpreting this science. The natural and social sciences therefore share some common principles, such as theory building and empirical studies to identify casual explanations, even though we can’t directly observe social phenomena like gender or poverty or aspirations or wellbeing.  We have to construct these.

This ‘realist’ philosophy of social science does not restrict our methods. Different research designs (such as ethnographic studies, cohort studies and trials) are all important, and they help us answer different questions. However, Pawson’s ‘realist manifesto’ rules out the use of trials. This would actually hinder rather than help evaluation science.

Realist randomised controlled trials

By comparing two or more similar groups of people over time, trials increase our confidence that any change (or lack of change) observed is due to the effectiveness (or lack of effectiveness) of what is being evaluated. If you randomise, these groups should be very similar. Social science trials are not naively trying to replicate clinical trials and evidence-based medicine but at the same time, doctors haven’t copyrighted trials. They’re a social science research design too.

Realist randomised controlled trials (RCTs) can, and already have, contributed to our knowledge about what works for whom, and under what circumstances. For example: does using peer opinion-leaders to prevent smoking in schools work, and who benefits? Does reducing the price of fruit and vegetables in Dutch supermarkets increase consumption or not? These questions were answered using ‘realist RCTs’ and have made valuable contributions to knowledge of ‘what works’, for whom and how effects occur.

Reviewing multiple trials also helps explore if effects vary in different contexts too. For example: does reducing the price of fruit and vegetables still work in Manhattan?

Straw man arguments against trials

Pawson’s latest book wheels out lots of the usual fallacies about trials. It’s hard to deal with them all here but I’ll address some of the main ones. The first fallacy is that RCTs work “on the notion that one can evaluate a programme by isolating and dismantling its effects from those of the rest of the world”. But doesn’t all research involve ‘isolating’ something which is smaller than the whole world? And no-one is trying to run away from complexity. On the contrary, by using a comparison group, trial designs take proper account of, rather than ‘dismantle’ and bracket out, the complexity of social causation.

Another fallacy is that RCTs can only evaluate ‘standardised’ interventions. That’s not true, they can be used to evaluate more or less flexible policies and programmes (and help us understand how effects vary). Pawson also argues against trials on the grounds that “policy problems are often pressing, experiments are sluggish”. But if speed of data collection determines research design, we’re relying on cross-sectional snapshots of programmes. That’s hardly a solid basis for scientific evaluation. Policy problems are best solved by robust evidence about causation that stands the test of time (e.g. trials showing the benefits of pre-school programmes are still very influential several decades later).

Perhaps what’s most frustrating about this new book is that the social problems drawn on as examples, such as the high proportion of British youth not in education, employment or training (NEET), highlight why we need more trials. No-one doubts that whether someone ends up NEET or not will be determined by a complex array of individual, family, social, cultural and economic factors. But this only makes understanding complex causal processes more important. Problems persist partly because we lack a decent body of knowledge about what helps to engage young people in education and training more effectively.

Let’s take the example of the Educational Maintenance Allowance (EMA), controversially scrapped in 2011. This is mentioned several times during the book, including in the context of “infinite” complexity. Yes, true, things are complex, but we also need to be practical and try and understand whether or not a targeted cash transfer policy to teenagers helps retain them in education and training, if that is the aim. Like many targeted interventions, it was pulled when cuts were made. It would have been much harder for this government to abolish the EMA if the last government had evaluated it to establish a positive impact in poor communities.

Many such policy trials have been carried out by DECIPHer. For example, the national exercise referral scheme in Wales was successfully evaluated using a trial design. The Treasury’s Magenta Book now advocates this approach but policy trials remain rare. As Helen Roberts has discussed in What Works in Reducing Inequalities in Child Health, trials allow you to roll-out new ideas initially ‘only in research’ until you’re more confident of positive effects. And for those who argue it’s not ethical to exclude people (in the comparison group) – well, no one gets EMA now, so no one benefits. What’s harder to justify ethically is spending vast amounts of public money on ‘common sense’ solutions without any decent evaluation.
Reasons for optimism?

Pawson concludes The Science of Evaluation by suggesting “there are reasons to be optimistic about the scientific status of evaluation”. Yes, there are. The government’s new ‘What Works’ evidence centres are one such positive development to support more rigorous policy evaluation. The fact that the Department of Education commissioned Ben Goldacre to write a report on ‘Building Evidence into Education’ is further grounds for such optimism.

However, while Ben Goldacre, Mark Henderson, Geoff Mulgan and many others work hard to make the case for more experimental evaluation to help government, social science academics are often the biggest enemy of evaluation science. An RCT design is completely compatible with ‘realist’ social science and helps us test if policies and programmes have the benefits they intend to. Without them we might also end up doing more harm than good.

The best social science schools in Britain like Cardiff and the LSE are training students to think critically and ask difficult questions about the complex causes of things and how to address these in practical, scientific ways. However, what’s worrying is the anti-science myths and legends approach to trials still perpetuated elsewhere.  Of course, we shouldn’t only focus on trials. Research questions are what should drive ‘realist’ evaluation but we need trials to answer some important questions, and they are still under-used at present.

Dr. Adam Fletcher (@DrAdamFletcher) is Senior Lecturer in Social Science and Health at Cardiff University, and a member of the management team of DECIPHer.

Leave a Reply

Your email address will not be published. Required fields are marked *