How do you write good exams?

My first order of business this week was writing a draft of the first exam for my upcoming Biometry course this semester here at UMN-St. Paul. When I developed an interest in science pedagogy several years ago, I realized that the field was a mile wide–if I wanted to, I could second-guess everything I thought I knew about the “proper art” of teaching. However, to borrow the mantra of a good colleague and friend of mine at UW-Madison, the essence of becoming a better teacher is in focusing on “manageable change.” Teaching is hard work! It’s hard to give up old ways, try something totally new, ditch materials you’ve spent years polishing, etc. Plus, it’s a little like throwing the baby out with the bathwater to do things that way. So, instead, I decided I would pick just one major aspect of teaching to really focus my creative energies on–I chose assessment.

If you think about it, after the delivery of content itself, assessment is probably the most important aspect of teaching. It is the process by which you measure the progress of the learner–what can they do, now that they’ve (hopefully) learned? After all, if a learner cannot demonstrate increased skill with the material during and/or after the learning process, has meaningful learning occurred? Should we even have bothered, if not? These questions emphasize the duality of assessment–it represents not only exercises meant to measure performance but also a means to facilitate practice of skills and knowledge along the learning process. It is both the measure of performance and also a means of ensuring progress (this division is formalized by referring to assessments as either summative or formative when they are meant as “definitive” versus as “practice”). Actually, come to think of it, assessment is even sometimes used prior to the learning process to gauge the appropriate depth and focus of the learning process itself (such as when a pre-test is given and the curriculum designed to suit the skill levels of the incoming learners). So, I guess in this way, assessment is the alpha and the omega. Ideally, it comes before, during, and after the learning has occurred, and the learning is designed so as to benefit from, as well as to benefit, the assessment process.

So…it’s pretty important! I think that’s what I’m trying to say. So…how do we do it well, then? You’d think everything that needs be said on this subject has already been said. For example, surely there are definitive resources out there on how to craft the “perfect” multiple-choice question, or prescriptions for the maximum length of the stem of a question, or high-profile studies on how the inclusion of double-negatives in questions affects the performance of non-native English learners. Well, I’m not saying there aren’t resources out there to help guide practitioners on proper exam design, but there certainly aren’t anywhere near as many accessible resources out there as I’d have expected! In fact, about a year ago, I tried to do a literature search to find papers looking at any aspect of “proper exam question design,” and I shockingly came up empty! I may have been using the wrong search terms (I mean, I hope I was, in a way!), but the fact that such studies and resources are even the least bit difficult to find suggests to me that this is an area that is woefully under-explored. How many college classes did you take that didn’t have exams? I can count the number on one hand for me. I’ve never even seriously considered not giving exams in the classes I myself have taught (I’m just not brave enough, I guess). Exams are a mainstay of modern assessment, for better or for worse, and yet we seem to know so little about what makes an exam “good.” Or, at least, I don’t. Not definitively, anyway. However, I do at least have my principles and my experiences to offer, so that’s what I’m going to record here (in no particular order) for those who are interested. So, don’t expect me to offer much in the way of empirical evidence for anything I have to say here! I wish I had some for you. Still, I figure if we all at least share our experiences, we can eventually build an informal body of theory that might then motivate more formal explorations. Plus, maybe something I have to say will resonate with you! So, here goes nothing.

These are things “I’m pretty sure I’m sure about” when it comes to assessment (and exams specifically)

  1. A course should be designed around assessment, not around content. This idea can be simplified by saying “backwards design for the win!” Backwards design is where you start designing a course around what you want your learners to be able to do/demonstrate at the end (you start at the end and “work backwards”). I’ve even heard it described this way: You start by writing your final exam, and then you decide what your midterms exam will look like (since they’re “practice” for the final), and then you decide what your homeworks will look like (because they’re “practice” for the midterms). Then, you decide how you will teach (what format will enable maximal preparation for and performance on the homeworks, midterms, and final?) before, lastly, deciding what you should teach (what is the minimum content coverage needed to prepare students for the assessments and to engage them in the learning process?). The thinking is–if you aren’t interested in what students can do when they’re done, or you aren’t preparing them for those exercises, what are you doing with all that class time? In practice, backwards design or not, it seems like assessment should be “a point,” even if it’s not the “whole point.” It can’t be an afterthought. It must be something you give considerable attention to during the course design phase, and it’s got to be in the back of your mind the whole way through a course.
  2. Assessments should be “mapped.” This is a corollary to point 1. By mapped, I mean there should be a firm and explicit connection between what students learn and how they learn it, what you ask them to do on assessments, how you assess their performance, and what your learning goals are (the term comes from exercises in which you “map” assessments, in flow chart form, to learning goals and lesson content). As an example, imagine that a learning goal of yours is that students will be able to articulate how some misconceptions of evolution are incorrect, and you have an essay question on your final that asks the students to engage with and counter-argue an incorrect perception of evolution. Now, imagine you never give students a question like this one on a homework or mid-term, you never have an in-depth conversation on this topic in class, and you only explain in class how common misconceptions are wrong but not why they exist in the first place (or bring student views into this conversation for engagement). Are students ready for the exercise you have in store for them? Should you be surprised if they fail to live up to your standards? This leads to my next point:
  3. Practice makes perfect. There needs to be regular practice of the corresponding format and content you plan to use on your ultimate assessment. Otherwise, it’s like you’re training your students for a swim race but hand them a bike to train with! For example, imagine that your lessons feature lots of quick-to-answer multiple-choice questions, and the homeworks are similar. Then, on an exam, many of the questions are short-answer instead. The “cognitive mode” of these two question styles are very different! MC questions reward being able to discriminate between less correct and more correct options, and the “field of thought” is limited to the options given–the best answer is here somewhere; they don’t have to generate it. Also, making educated guesses is easier. So, you should create spaces where students can try out the exact types of questions you want them to eventually excel at answering. Some may call this “teaching to the test,” to which I would respond “If you’re not teaching to the test, why is that the test you’re giving?” That doesn’t mean you have to cover the same exact content, but it’s good to cover similar content in the same way.
  4. Feedback is best when it’s immediate and personalized. I don’t make this point lightly. Yes, I am essentially saying you should grade homeworks, quizzes, and other assessments as soon as possible. I try to turn them around within 48 hours! Why? Simple–students need to be told they’re wrong (and why) while they still remember what they thought and did the first time. Getting feedback 2 weeks later will usually mean “the moment” has passed–I can barely remember anything I did in my life 2 weeks ago. How can I remember what I was thinking when I answered this specific question and why my thinking may been off? That’s not even to consider that I may have (imperfectly) learned 2 more weeks of content since then that I may have to filter out to make sense of your feedback. If the goal is to correct their knowledge and/or process, the more directly into that process the feedback can be inserted, the better! This is one of the reasons I do not knock question styles that can be autograded–these provide instantaneous feedback, which can be a godsend for formative purposes. It’s also really valuable if the feedback can be personalized–this is where your understanding needs improving. This does not mean, however, that the feedback needs to be long or that you need to spend 15 minutes per student crafting lengthy responses. What I do is have a Word document open in which I dump one-word to one-sentence comments to each question and then copy-paste these comments over to the student’s work as I grade. After all, I frequently put the same thing on many students’ work. This way, I don’t have to type something new each time, but students are still getting feedback tailored to their specific performance and needs. I then supplement with a fully worked key that students can peruse if they want even more guidance. One other thing I’ll mention–it’s easy to write discouraging feedback but really hard to write encouraging feedback. The simplest thing to do is, before you tell someone they’re wrong, to acknowledge that being wrong is ok and understandable. Confirm for them that the struggle is real, but you’re here to help! For example, consider starting feedback “I see exactly why you’d say this, but have you considered…”
  5. Assessments should be challenging but they shouldn’t be tricky. If one does not need to practice or try to succeed, then one has to question the value of the exercise, no? Students should need to work to excel; the average question on an exam should be a bit of a cognitive workout! This is especially since the brain is a lazy organ by design. If it doesn’t have to work hard, it won’t–if it doesn’t need to exert itself to perform well on an assessment, it won’t put much energy into maintaining the requisite knowledge or skills, and the learning won’t be durable. And durability is probably the goal, right? The learner will simply forget the material promptly after the assessment, if they ever really remembered it in the first place! Besides, wouldn’t it be cool if students could learn from an assessment? This is hard to achieve if the assessment never forces them to break new cognitive ground. This all said, exams should not be tricky. What I mean is that the questions themselves (i.e. the words they are constructed out of) should not be the impediment to success. I should not have to parse a double negative or correctly interpret which meaning of a common word like “set” you mean to successfully answer a question unless this is a grammar class and the parsing/interpreting IS the point. Remember your learning goals–does the question test the student’s understanding of a key disciplinary concept, or it is really more of an English comprehension question? The latter most likely belongs on an English comprehension exam and few other places. The way I think about this issue is this: The intent of the question (“what I am supposed to do”) is clear, but the right answer is not unless I understand the underlying concept. A clueless student should still be able to accurately explain to me what a question is asking him or her to do–they just may not be able to deliver! Obviously, including vocabulary in a question, such that understanding the vocab is necessary to parsing the question, has its place! But make sure that that vocab is central to your learning goals.
  6. Assessments should be fair for all learners. I sincerely wish I knew exactly what this meant in practice, but I probably don’t. Here’s where I feel like more research is especially needed! But here are some things I think it might mean:
    1. Questions are not overly long. If they are, they make the assessment more about reading comprehension and reading speed than about the content. There is probably an association between the best minds and the highest skill levels in these skills, but that rewarding that association seems contrary to the point to me.
    2. Questions should be simply worded. If a complex or multi-syllabic word is not the only option, maybe don’t use it. Avoid double negatives and idiomatic expressions (you know, things like “cut the mustard”). Keep in mind that not all learners will be native English speakers, and even native English speakers may not have equal access to idioms (English is dialectical, you know!). If you have to use a complex word, but knowing the word isn’t really the point, include a casual definition of it as an escape hatch. I suppose one counter-example here would be when the learning goal is being able to identify and exclude extraneous information, such as in story problems, but I think that’s the exception that proves the rule–no unnecessary complexity, only necessary complexity.
    3. Think about whether reducing a question to a format with a smaller “cognitive space” might work. True and False and Fill in the Blank-style questions can be challenging while also focusing a learner’s thinking to a smaller cognitive space than an essay question would. It might be easier to give meaningful and personalized feedback more quickly then also.
    4. Carefully consider how much time should be a factor during an assessment. I admit to having a particularly hard time with this issue. On the one hand, there is a time and place for asking learners to demonstrate an “easy, fast” skill at something. For example, during a job interview, you should be reasonably expected to give a competent answer to a question more or less immediately. However, if we are challenging students and asking them to extend their thinking in complex ways, perhaps we need to give them the appropriate time to do so successfully. After all, it’s rare that true brilliance emerges off the cuff. Plus, introverted students [a disadvantaged group in American classrooms that doesn’t get nearly enough attention] especially benefit from having extra time to deeply consider their answers, and if we expect deep answers, we should be giving all students the time they need to produce them. There is a trade-off here, though, which leads into my next point.
    5. Summative assessments should not provide “bailouts.” My rule (or maybe it’s a “limit”) is that students who are more prepared should perform better than students who are less prepared. If, by giving students 6 hours instead of 2, or a take-home test instead of an in-person one, we allow unprepared students to study the minimum and still perform as well as prepared students, then something has (probably) gone wrong [although I would argue an exam should be challenging enough that unprepared students can’t successfully do this!]. So, I think the key thing here is a commensurate balance between challenge and opportunity to meet that challenge, which is probably a balance that takes time to get right.
    6. Carefully consider the extent to which memorization is a learning goal of yours (and where exactly it applies). Going along with this, consider whether organization and effective resource usage should be learning goals. What I’m getting at is this: Consider allowing students to have access to resources during an assessment. For example, what statistician doesn’t have access to a reference text when they are trying to use a formula? What chemist doesn’t occasionally glance at a periodic table? Is memorizing things like this a part of our learning goals? Should they be, realistically, given that it’s the Google age? Or is it just as important that learners understand how to use the resources they have access to? Keep in mind that a less prepared student may still not be able to use a resource effectively during an exam just because they have it (they may not know what pages of the book to go to or which of several similar formulas to use), so this does not have to represent a bailout.
    7. Exam language (and examples!) should be inclusive. Mix up he, she, and they pronouns. Have examples that span racial, ethnic, and geographic bounds, where appropriate. Students will be able to engage more with an assessment that they “feel a part of.” It’s not pandering–it’s an easy way to tell people that their identities and interests are valid. On the flip side, avoid examples that are income- or privilege-gated. The one that comes to mind for me right now is golf–golf is an expensive sport to play! Framing a question that assumes a basic understanding of the workings of golf is probably disadvantageous to students from lower socio-economic classes. If you don’t need to do that, why do it? Beyond that, not everyone is engaged by golf, even if they may understand it! Which leads to my next point.
    8. Variety is good. Subject matter, question style, question complexity–whatever you can switch up, do it! You want to spread the wealth. If I think clams are boring, and half the exam is based on clam examples, I have to make my lazy brain work just a bit harder than a student who thinks clams are dope, right? That’s a source of bias our exams can probably do without. Plus, for some people, True/False questions are almost unbearably easy while, for others, they are fraught–“How false is false enough?” There are often many avenues to reach the same place–see how many you can incorporate into your assessments. Doing so increases the likelihood learners will learn something new from the assessment itself also! The last point I’ll make here is this: Make sure you spread the wealth with content on the exam as well. I try to make sure every reading, lecture, homework, and quiz is represented by at least one question. That way, if some lectures were more interesting or effective than others for some learners, there’s no chance that the exam will end up being a punishment for some and a boon for others just because I chose to emphasize one batch of content over another!
  7. Remember–assessments are not (usually) punitive. It seems to me like some teachers frame assessments as “gotchas!” Maybe I do that sometimes (I am definitely interested in catching students who have not made the effort to learn!). Certainly, students tend to view assessments as punishment, perhaps because we craft them in a way that reinforces that mentality. This, I’m sure, is a large factor in the test anxiety that seems to be rampant in schools. I’m lucky–my personal philosophy about exams has always been a positive one. “Oh good, a chance to demonstrate how much I know!” “Oh good, a meaty challenge to sink my brain into!” Wouldn’t it be nice if most students felt this way? Can’t you imagine their performance would be higher if they viewed an assessment in a positive light, as though it were an opportunity? And shouldn’t we want performance to be high [I’ll get back to this idea later]? Some of this has to do with how we frame the thing–keep a light atmosphere in the exam room beforehand. Put silly (but legit) examples on the exam. Be really assessible and helpful for questions during the exam (Make it feel like it’s you + them vs. the test, not you + the test vs. them!). Allow them to have snacks and “security blanket” items with them (like their own calculators). Try to be in the same room you have class in so that the room is familiar. Maybe don’t lecture them for 5 minutes about the steep consequences of cheating before they start. There are so many little things we do that make exams more adversarial-feeling, and I just don’t know that we do ourselves any favors by doing that.
  8. Assessments should be reasonably frequent. It takes me roughly 4 hours to write a single lecture. Those lectures are usually about 20 very full slides long, plus they are usually based on an additional 10-20 pages of textbook reading. I want you to pause and consider just how much content that single lecture represents. Now, I want you to imagine you’re a novice, such that it’s very difficult still for you to differentiate between critical and non-critical content, so filtering out unneeded content is challenging. Now, I want you to times that amount of content by 10-15 (a third to a half of a semester). Oy–that’s a lot. We need to ask ourselves–how much volume of content is appropriate for a student to have mastered by a given point, especially if they’re only getting irregular feedback? I’m reminded of a story I heard; a professor ditched his exams in favor of weekly quizzes instead to great success because students were able to more regularly try to master less content at a time. At what point do we simply overload learners? Especially when you consider that even a well-written 50-Question exam doesn’t even come close to covering 100% of the content covered in a 10-lecture period? At that point, the assessment risks being a “roulette wheel,” one in which learners just have to get lucky that material they understood or especially liked happens to be the content that ends up on the exam.
  9. We have to acknowledge that assessments are fallible. Sometimes, questions just don’t “work.” Maybe the choice of a specific word enabled students to (fairly) read the question a different way. Maybe students collectively missed that idea because that was the lecture after the big game, or you stumbled in your explanation. Maybe the book contradicted you on that question! When 10% of the class gets a question right, it’s not (usually) because the students were dumb or lazy. Sometimes, you write a bad, broken, or ambiguous question, or acts of God undermine you. Have the humility to take those instances in stride! Actually, have the humility to go looking for those questions. I analyze every question on every exam I give to see whether it worked or not. If it didn’t, I don’t mad–I do something about it. Throw out the question, make it extra credit, offer more partial credit, be lenient–being willing to change the rules a bit after the fact isn’t being weak, it’s being realistic. Another tactic I use sometimes–I throw out a question but tell students they will see the exact same question again later, and I will expect different results. I just try to ensure my exam is a little longer than it otherwise would be so that I have room to expel the most unruly couple of questions without it being much of a deal to do it. One last thing: Sometimes, a 10% correct answer percentage isn’t a failure of the 90% but a triumph of the 10%–when you can make students feel special for nailing something particularly hard (by congratulating them and giving them the question as extra credit, e.g.), that can be something that sticks with them for a long time!
  10. Summative assessments need to be edited and play-tested. Exams are often a huge chunk of a student’s grade, and some classes we teach have long-reaching ramifications for a student’s future and career. Plus, we fully expect students to produce well-edited products that were started well in advance. And yet many of our exams are written at 2AM the day before the exam, with nary a thought to editing! For shame. A hastily written assessment anything is unlikely to be flawless. We make mistakes when we’re rushed (a good thing to remember with respect to point 6-4 above!). We need to write exams with enough time to spare (at least a few days, but a week or two is better!) that we can let the exam sit before coming back to edit it so that we can view it with “fresh eyes.” We also should take our exams ourselves–how long does it take us [a good rule of thumb–it should take you 20% less time than you expect your best-prepared student will need]? How difficult is it to write a key, or explain why one answer is better than another? Can you have someone else (preferably a former student!) examine the test to see where problem areas might lie? An exam is a complex craft, but one we often try to assemble as quickly as possible. Allotting some time for QA/QC seems appropriate. Also, writing assessments early reinforces the ideas of backwards design–you should know more or less what you want to ask on the exam before you design the corresponding lessons because then you know what those lessons should cover and how anyway! Oh, I also have a trick of sorts to share here: Try writing one exam question after each lesson based on the moment you felt worked the “best” from that lesson. If, at your most engaging moments, you are not succeeding at getting your students to your learning goals, that tells you a lot! Besides, then you will have much less work to do by the time exam-drafting day rolls around.
  11. You should have a working definition of what it means for an exam to be “good.” This is another place where a body of theory doesn’t seem to exist but really needs to. Consider this question very carefully: What does it mean for an exam (or an exam question) to be good? Good by what measure? For whom? Think about it this way: Is an average score of an 80% “good?” Can an average score of 40% ever be “good?” What about 100%? If we do not know what we are striving for, we are engaging in a random walk! How can you improve your exam craft if you have no standard by which to measure improvement? Without a theory in place, we need to each individually answer this question for ourselves, I suppose. For me, for example, a 100% average on an exam would be something I’d permit, but I’d probably consider it a failure on my part (but also a triumph on my students’ part!). One of my measures of “good” is that a good exam is able to detect differences in preparation and learning (provided they exist, of course!). If every student is getting basically the same score, than I have likely failed to craft an exam that can discern less-prepared students from more-prepared ones. I’ve written a “free-for-all” test, basically! All the extra preparation my better-prepared students was effectively “punishment.” This would probably mean I’ve also missed a chance to challenge students more, since they were clearly up for the challenge. [Now, some of you may read this and think: “But if you were satisfied with the challenge of the exam and they all aced it, what’s so wrong about that?” To which I’d probably respond “nothing at all!” And then I’d pat myself on the back for being the best teacher of all time, if I can get a whole class to ace an exam I’d deem challenging!]. So, this ability to discriminate between different levels of learning is something I assess when I review an exam post-hoc. Whatever your standards are, spend the time after the fact seeing if your exam successfully met them (and, you know, have standards!).
  12. A good exam tells you (and, ideally, your learners) what misconceptions your learners hold. I told you not to use golf analogies earlier, but here comes a golf analogy. When you hit a golf ball and it goes straight, you know you hit the ball correctly. When you hit one and it hooks violently to the right and into the woods, you know you hit the ball incorrectly. We tend to craft exams that work like golf balls–they tell us when we’re right and wrong, but that’s about it. Wouldn’t it be nice (in a way!) if the golf ball could tell us “you hit me too far to the left” when we mis-hit the ball? Or there was a pro standing there all the time to tell us that we took our eyes off the ball at the last minute? It’s so much easier to correct and learn from our mistakes when we know exactly what those mistakes are and what we could work on to correct them. A really good exam doesn’t just identify that a student is wrong but how they are wrong. Have you ever considered the possibility that the wrong answers we put on our exams could be the most important parts of the exam? if not, then your exams have a hidden superpower waiting to be unlocked! Consider a MC question with 5 responses–one that is right and four that are simply nonsense. If a student gets this question wrong, we can conclude they do not understand the concept (they cannot separate truth from nonsense), but not what misconception they hold. Now, imagine that the four wrong choices each play into one of four popular misconceptions. By choosing any given wrong answer, a student is telegraphing what it is, exactly, that they don’t quite understand yet. Imagine how much easier it would be to help this student! And for this student to help themselves! It’s difficult sometimes, but I try to have 0 “throw-away” options on an exam. Every answer should tell me something actionable about what the student does or does not know. And if there isn’t a misconception or plausible wrong avenue to go down, then it’s probably not a question I need to ask. At the class level, this gives me feedback on what aspects I need to stress and which I don’t next time (that misconception is prevalent but this one isn’t, e.g.), which is a nice bonus. But you might say “But what if I can’t think of 4 logical ways to get a concept wrong?” Then don’t! It has taken me a long time to get to this point, but every MC question needn’t have exactly 4 options. Blasphemy, I know! Give them as many choices (or as few) as they need to get to have a meaty challenge and don’t muddy the waters any further than that.
  13. Assessments should be relevant, realistic, and authentic. Perhaps I was just redundant–I’m not sure. What I mean is that an assessment should “feel like the real work a learner is training to do.” When an assessment feels like an abstract exercise, I can empathize with students who treat it like a punishment. I mean, life is rarely like taking a paper exam at the best of times (which is maybe why traditional exams should be less a part of our assessment repertoire!). Still, that doesn’t mean we can’t frame our exams in a more lifelike vein. For my Biology exams, I like to frame short-answer questions like “debate topics” that professional biologists might reasonably debate. For my Environmental Science exams, I like to provide snippets of news articles and have them engage with the scenarios in a “what would I recommend?” way, much like an environmental consultant might. For my Biometry exams, my mentality is going to be “many of my students will need to use statistics to write their theses, so let’s frame questions as though they are active research projects.” If we are not training learners to be able to do tasks they need to be able to do to achieve real-world success, then we are probably failing them! That means our assessments need to be a big part of that.

Oof, another long post! It’s good to get these things out of my head and onto “paper” though. What did I miss? What resonates with you? What resources do you have to share? Please express yourself in the comments!