by Hannah Gurr
It was the first week of June, and I’d just finished what I considered to be the least enjoyable part of my job: marking a pile of end-of-course writing under exam conditions.
Whereas I really get a buzz from using classroom techniques of formative assessment (also known as ‘assessment for learning’), I found these hours of summative assessment (‘assessment of learning’) quite gruelling, although of course I accept that it’s essential. End-of-course assessment provides the purpose for the majority of what I do in and out of the classroom, and at CELFS we take pains to ensure that intended learning outcomes, lesson activity, success criteria and assessment are constructively aligned.
Perhaps it’s the lack of human contact, as the exam scripts are anonymised, precisely to prevent one’s feelings about the student influencing decisions about whether to give a ‘good’ or ‘satisfactory’ grade on a particular criterion. My ‘teacherly’ instincts are frustrated, as I can’t use this student’s errors as a springboard for further development. Then, there is the worry over standardisation – not only is there the concern that I am not in line with my colleagues, with such a lot of marking, I wonder if I’m even applying my own judgements consistently!
Having just come to the end of the ‘Language Testing and Assessment’ unit in my MSc TESOL course, I have a more realistic view of what it means to assess students’ academic speaking and writing skills on our courses and I see what can be done to compensate for the fact that even expert human judgement is fallible.*
Firstly, I realise that it would be bizarre if the diverse group of teachers that make up a particular year’s pre-sessional tutors all gave the same piece of writing the same grade. I used to worry that if my evaluation was far out from the ‘official’ grade, my job could be at risk. Now I understand that this variation is perfectly normal. Secondly, I realise that the purpose of the standardisation meeting is not to defend the grade I gave, but to re-calibrate my own scale, so that there is less variation between tutors. That’s why I feel that standardisation should come before starting teaching and again before marking.
Although we are a diverse bunch, we are still expert in assessing writing. This means that there will be a broad overlap of aspects we think are better and those we deem to be worse. However, I might be measuring this student against a Platonic ideal: a distinction-grade, native-proficiency performance, and find it wanting; whereas my neighbour is thinking, ‘wow, if I had to write something in my second language, I’d be proud if it were as good as this!’ We need to leave our egos at the door, and accept that for the next few weeks this is what we are going to label ‘good’, ‘very good’ or ‘satisfactory’.
It’s true that double marking creates more work, but in the past weeks I’ve seen again how important it is. It’s great when my colleagues and I agree, or are just one point out, and it’s reassuring to re-examine the script when we find we are a band out. I’ve been persuaded by my colleagues, but have also stuck to my guns on certain judgements, and convinced them in turn. Details I’ve missed are spotted by my colleagues, and vice versa. Excesses of both soaring into the 80s band, just because the handwriting is neat, or being overly hawkish because a student has included one of my bugbears are tempered by marking with a peer.
Finally, after haggling over all those points, there is the spectre of input error. You may have spent 5 minutes debating whether the SAQ [short answer question] merited a 65 or a 62, but then you misread or mistype and that student gets his neighbour’s 55. Again, input error is greatly reduced if you work with a colleague.
I’d be very interested to hear your thoughts on marking final assessments, whether you share(d) my concerns and/or have any ideas about how to overcome the variation in judgement that is a normal part of being human.
* An example mentioned on QI and cited in Daniel Kahneman’s book Thinking Fast and Slow (2011) is the study out of the National Academy of Sciences, which found that judges were more likely to award parole in cases they heard immediately after taking a meal break.
The assessment standards should be set from the beginning of a course. These standards should be discussed, adjusted to the objectives of each course and agreed on by all personnel. If these are routinely followed throughout the course, there won’t be any embarrassing differences between the markers in the final assessment in the end-of-year examinations. Regarding the final assessment, which defines the students’ progress, when there is a narrow difference between the markers, it’s far better to use the average mark instead of wasting valuable time defending either marking.
Very useful comments, thanks! Yes, we do usually split the difference when we are only one or two points out. But sometimes, borderline cases take longer to decide 🙂
Resonates. Nice interpretation Hannah. Useful insights from the MA module 😃
It’s really frustrating that they don’t get the benefit of our language analysis after the pains we take to identify their bullet points. I know a lot of them aren’t interested in knowing their cohesive devices may be mechanical in places, as all they want to know is if they have got onto their course, but we could use this information to really help them develop if it was timed better. I think we could benefit them more with a closely marked essay in the middle of the course and perhaps a more scantily assessed one at the end. Plus ….oh dear, perhaps an opportunity to submit a draft somewhere rather than an outline 6 days before the final submission. … anyway that’s my two penn’rth.
I can sympathise with what you’ve written here, Hannah! Where I work (teaching EAP at a uni in Germany), we have a grading matrix, but no double-marking policies, and blind-marking is also not implemented across the board. Since I’ve been the team leader, we’ve updated the grading matrix and also sit down together each term to discuss sample essays that we’ve all marked against the matrix to check that we are basically on the same track. This has definitely made me (and hopefully my team!) feel more confident in our own judgments of students’ work! And I think that we all try hard to show students how the end-of-course feedback is not only intended to justify their grade, but to give them ideas of their own strengths and weaknesses to take with them into their next course. I think trying to use the end-course-feedback to still give feedback FOR learning is important. But yes, if it’s anonymised this is a bit harder!! Do your students get any feedback, or just a grade? What I have done previously where the papers were anonymised is send a general feedback email to the whole class – of course I don’t know if the students take it on board as well as inidividualised feedback, but it definitely gives me a stronger sense of useful purpose when I’m marking their end-of-course work!!