Friday, 8 September 2017

Leading evidence-informed teaching : the rhetoric and the reality

A recent report sponsored by the Department for Education suggests that there is a significant difference between the rhetoric and reality of evidence-informed teaching within schools, with a number of schools appearing to adopt the rhetoric of evidence-informed teaching, whilst at the same time not embedding research and evidence into their day to day practice (Coldwell et al., 2017).   Of the twenty-three schools involved in the report only six schools could be described as having a whole-school approach to research and evidence, with another seven schools where the head and senior leadership were proactive in their approach to a culture supporting the use of research and evidence, and finally, ten schools having an unengaged research evidence culture.  This finding was particularly surprising given the attention to creating a balanced sample of schools in how they are engaged with research evidence.

The difference between unengaged and highly engaged research evidence cultures is illustrated in the following table from Coldwell, et al 2017


So before I begin a series of posts where I can examine what can be done to close rhetoric reality gap – perhaps it would be worth undertaking a brief self-audit of which column best fits your school’s use of research evidence.   In doing so what I would like you to do if you locate yourself within the school leadership evidence culture and whole school evidence –culture is try and think of three pieces of supporting evidence – which you can use to support your judgement

Next week we will look at the what school leaders can actively do to make sure a rhetoric reality gap does not emerge. 


COLDWELL, M., GREANY, T., HIGGINS, S., BROWN, C., MAXWELL, B., B, S., STOLL, L., WILLIS, B. & BURNS, H. 2017. Evidence-informed teaching: an evaluation of progress in England Research report. London: Department for Education.

Friday, 1 September 2017

Judging the trustworthiness of research findings

In last week's post,  I drew attention to differences in expert opinion over the usefulness of statistical significance testing, particularly as regards randomised controlled trials (RCTs).  This week we will look at what else can go wrong with RCTs and questions that you need to ask when seeking to judge the trustworthiness of the associated research findings.  But first, let's quickly describe what we mean by a RCT

What is a RCT?

(Connolly et al., 2017) describe an RCT as … ‘a trial of particular educational programme or intervention to assess whether it is effective; it is a controlled trial because it compares the progress made by those children taking the programme or intervention with a comparison or control group of children who do not and who continue as normal; and it is randomised because the children have been randomly allocated to the groups being compared (p4)

What can go wrong with RCTs?

Unfortunately, lots can go wrong with RCTs.  (Ginsburg & Smith, 2016)  reviewed 27 RCTs that met the minimum standards of the US based What Works Clearing House,  and found that 26 of the 27 RCTS had serious threats to their usefulness.  These threats are listed below
  • Developer associated. In 12 of the 27 RCT studies (44 percent), the authors had an association with the curriculum’s developer. 
  • Curriculum intervention not well-implemented. In 23 of 27 studies (85 percent), implementation fidelity was threatened because the RCT occurred in the first year of curriculum implementation. The NRC study warns that it may take up to three years to implement a substantially different curricular change.
  • Unknown comparison curricula. In 15 of 27 studies (56 percent), the comparison curricula are either never identified or outcomes are reported for a combined two or more comparison curricula. Without understanding the comparison’s characteristics, we cannot interpret the intervention’s effectiveness. 
  • Instructional time greater for treatment than for control group. In eight of nine studies for which the total time of the intervention was available, the treatment time differed substantially from that for the comparison group. In these studies, we cannot separate the effects of the intervention curriculum from the effects of the differences in the time spent by the treatment and control groups. 
  • Limited grade coverage. In 19 of 20 studies, a curriculum covering two or more grades does not have a longitudinal cohort and cannot measure cumulative effects across grades. 
  • Assessment favors content of the treatment. In 5 of 27 studies (19 percent), the assessment was designed by the curricula developer and likely is aligned in favor of the treatment.
  • Outdated curricula. In 19 of 27 studies (70 percent), the RCTs were carried out on outdated curricula. (Ginsburg & Smith, 2016)(Pii)
So what are you to do?

(Gorard, See, & Siddiqui, 2017) in the recently published book The Trials of Evidence-Based Education suggest the following:

First, check whether there is a clear presentation of research findings - are they presented simply and clearly, with all the relevant data provided.

Second, check whether the research is using effect sizes as the way of presenting the scale of the findings.  If significance testing is being used, and p values are being quoted - you may wish to pause for a moment.  Though remember effect sizes have their own problems (see

Third, check where the research design sits on the research design hierarchy of causal questions. At the top of the hierarchy are studies where participants are randomly allocated between groups; below that are participants matched between groups; below that are naturally occurring groups used; below that is only one group studied and before and after data is used, and finally are case studies used (at the bottom of the hierarchy.

Fourth, check the scale of the study, for example, are at least 100 pupils involved in the study.

Fifth, look out for missing information - how may subjects/participants dropped out of the study.   As a rule of thumb the higher the percentage level of completion of the research, the more trustworthy the findings.  As Gorard at el note - a study with 200 participants and a 100% completion rates is likely to be more trustworthy than a study with 300 participants and a 67% completion rate.

Sixth, check the data quality.   Standardised tests provide higher quality data than say questionnaire data, with impressionistic data for causal questions providing the weakest evidence.  Make sure the outcomes being studied are specified in advance.  Is there likely to be any errors in the data caused by inaccuracy or missing data.

And finally

It's quite easy to be intimidated by quantitative research studies - but if you keep it simple - are effect sizes being used; are subjects randomly allocated between the control and the intervention group; is there missing data; are standardised measures of assessment used; and, are the evaluators clearly separate from the implementors - if the answer to all these questions is yes, there you can have a reasonable expectation that the research findings are trustworthy.

Further reading

Connolly, P., Biggart, A., Miller, S., O'Hare, L., & Thurston, A. (2017). Using Randomised Controlled Trials in Education London: SAGE.

Ginsburg, A., & Smith, M. S. (2016). Do Randomized Controlled Trials Meet the “Gold Standard”? American Enterprise Institute. Retrieved March, 18, 2016.

Gorard, S., See, B., & Siddiqui, N. (2017). The trials of evidence-based education. London: Routledge.

Saturday, 26 August 2017

When experts disagree?

As an evidence-based school leader one of the main challenges that you will face is knowing when and whether you can trust expert advice. As the beginning of the academic year comes closer, you need to prepare yourself for the tidal wave of expert informed  INSET/CPD  which is about to swamp schools and colleges. So knowing when to trust an expert becomes particularly important.  However, not all experts agree, so what are you to do?  Well, in the rest of this post we will look at an example of where experts disagree, and what strategies you can adopt when faced with such a situation.

Dylan Wiliam, Stephen Gorard and Significance Testing – Where experts disagree

Dylan Wiliam in his 2016 book : Leadership for Teacher Learning: Creating a culture where all teachers improve so that all student succeed (Wiliam, 2016)  writes extensively about teachers and school leaders can learn from research.  In a section on systematic reviews of research – which includes a review of the challenges associated with the use of randomised controlled trials in education – Wiliam states: Now there is no doubt that when RCTS produce statistically significant results, they produce strong evidence for a causal relationship. (p75)

In this single sentence there is so much to unpack and understand:
  • What are RCTs?
  • What do we mean by statistical significance?
  • What is strong evidence?
  • What do we mean by causal relationships? 

Now in this short blog post, it’s not possible to explore all the issues associated with each of these terms.  Instead, I’m going to concentrate on  just one issue – statistical significance – where there would appear to be disagreement amongst the experts.  And to help do this, I’m going to draw upon the work of  (Greenland et al., 2016) and (Gorard et al., 2017).

The very first problem that we face when seeking to understand terms significance testing and p values  is that as  (Greenland et al., 2016) state: ‘...  there are no interpretations of these concepts, which are at once simple, intuitive, correct, and foolproof’ (p337)

This causes a real challenge for both the novice and expert researcher when trying to understand and apply to concept of statistical significance, for as Greenland et al, go onto state statistical significance is often misinterpreted to imply that: ‘... statistical significance indicates a scientifically or substantively important relationship had been detected’ (p341).  Furthermore, they state that statistical significance only indicates suggests that the data is unusual, but could also be of no real interest. 

The challenge of interpreting statistical significance of the outcomes of randomised controlled trials is highlighted further by (Gorard et al., 2017) who states: Statistical significance just does not work – even when used as their advocates intended, with fully randomised cases. They cannot be used to decide whether a finding is worthy of further investigation or whether it should be acted upon on in practice ( p28)

What does this disagreement mean for you as an evidence-based school leader?

First, the debate and discussion over the use and value of statistical significance testing  live and controversial issue and is something you need to be aware of. 

Second, even if you are not numerically minded it’s probably worth spending a bit of time trying to understand the issues associated with statistical significance. So have a look at general articles such as and 

Third, whatever you may have learnt about the significance testing as an undergraduate or post-graduate, may well be wrong.  

Fourth, don’t assume just because there are problems with statistical significance that randomised controlled trials have little or no value – what matters is whether the research design is appropriate for the research question (Gorard et al., 2017).   

Fifth,  unequivocal trust in anyone expert or groups of experts, is not an option.

And finally,

If you have any aspiration at all of being an evidence-based school leader it requires a commitment to the time and effort necessary to continually developing your research literacy – this is not an event but a career long process. 

Next week

We will look at range of questions that you can ask which will help judge the trustworthiness of quantitative educational research.


GORARD, S., SEE, B. & SIDDIQUI, N. 2017. The trials of evidence-based education. London: Routledge.
GREENLAND, S., SENN, S., ROTHMAN, K., CARLIN, J., POOLE, C., GOODMAN, S. & ALTMAN, D. 2016. Statistical tests, P. European journal of epidemiology, 31, 337-350.
WILIAM, D. 2016. Leadership for teacher learning, West Palm Beach, Learning Sciences International.