Eye on Education

Computers Can Score Student Essays As Well As Humans, Study Finds

Stan Honda / AFP/Getty Images

Chess enthusiasts watch World Chess champion Garry Kasparov on a television monitor as he holds his head in his hands at the start of the sixth and final match in May1997 against IBM's Deep Blue computer in New York.

Sure, Deep Blue beat Kasparov, but can a computer score an essay as well as a human?

A new study shaped by the two groups that are leading the development of new online standardized tests suggests the answer is yes:

The results demonstrated that overall, automated essay scoring was capable of producing scores similar to human scores for extended-response writing items with equal performance for both source-based and traditional writing genre.

That matters because in 2014, most states will move from their currently mostly paper and pencil-based standardized tests to new online tests. Those tests will largely replace states’ current K-12 standardized tests used for state and federal accountability purposes. The idea is that switching from human to computer graders could make administering these tests cheaper for states nationwide.

Ohio state schools chief Stan Heffner has said Ohio could save as much as 40 percent on state testing costs each year by using software instead of humans to score tests.

The study was authored University of Akron researcher Mark Shermis and Ben Hammer, from a company that provides a platform for statistical and analytics competitions. The study compared the results from feeding student essays from six states into nine different essay scoring software programs and compared the programs’ scores with those produced by human graders.

Eight of the nine programs were commercial; the other program was a free, open-source software package developed at Carnegie Mellon University. Together they represent nearly all of the available automated essay-scoring options, according to the study.

The study is scheduled to be presented Monday at the annual conference of the National Council on Measurement in Education.

But the study notes that just because the computer programs agree with the humans, doesn’t mean they’re right:

Agreement with human ratings is not necessarily the best or only measure of students’
writing proficiency (or the evidence of proficiency in an essay)… The limitation of human scoring as a yardstick for automated scoring is underscored by the human ratings used for some of the tasks in this study, which displayed strange statistical properties and in some cases were in conflict with documented adjudication procedures.

And one of the study’s authors note that computer software hasn’t yet caught up with humans when it comes to identifying creativity:

But while fostering original, nuanced expression is a good goal for a creative writing instructor, many instructors might settle for an easier way to make sure their students know how to write direct, effective sentences and paragraphs. “If you go to a business school or an engineering school, they’re not looking for creative writers,” Shermis says. “They’re looking for people who can communicate ideas. And that’s what the technology is best at” evaluating.

Contrasting State-of-the-Art Automated Scoring of Essays: Analysis


  • Josh

    I find this horrifying on so many levels. It is something out of a dystopian novel. As an English teacher, I imagine that a computer can assess the sophistication of diction and syntax in a piece of student writing, but can it really assess the depth of a students’ thinking? How can a computer assess whether a piece of evidence really supports the claim that it purports to? Can it assess whether the writer uses precise diction and has control of tone? Can it assess whether the writer sufficiently connects the topic to his or her prior knowledge?
    On a more profound level, this is frightening in what it says about technology and its influence on our thoughts. Isn’t the purpose of writing to communicate from one human being to another? What are we really doing when we start asking students to write for a computer? I hope that someone is asking these questions before jumping on board with computer-scored essays.

  • bdleaf

    The problem is that it can be gamed. Computers can’t assess on complexity of ideas. Use the right content-specific keywords and seemingly grammatical (but semantic nonsense) structures, and you’ll probably get a good score. Assessment is ultimately a subjective practice. Back in the day, some people thought a test could determine your future occupation. Who’s measuring?

    While a set of human graders may be inconsistent, what they communicate to students through feedback and grades will probably be much more valuable. Fair treatment is not equal treatment–it’s tending to a students’ needs, and every student is different.

About StateImpact

StateImpact seeks to inform and engage local communities with broadcast and online news focused on how state government decisions affect your lives.
Learn More »