Imagine a school where every child gets instant, personalized writing help for a fraction of the cost of hiring a human teacher.
Computer essay graders are on the rise in Ohio and other states. Advocates say they save money and grade better than some humans. Sure, robots may be cheaper and efficient. But better graders?
“Of the 12 errors noted in one essay, 11 were incorrect. There were a few places where I intentionally put in some comma errors and it didn’t notice them. In other words, it doesn’t work very well,” he says.
Perelman says any student who can read can be taught to score very high on a machine graded test.
—Les Perelman, Director of MIT’s Writing Across the Curriculum in the Program
That’s because software developers build the computer programs by feeding in thousands of student essays that have already been graded by humans. The programs discern which elements of essays human graders seem to like, then look for the same things. So if human graders give essays with complex sentences high marks, the programs will tend to do so too. If human graders reward big words, the programs then will also, say, manifest a tantamount predilection for meretricious vocabulary.
Perlman and other critics say that makes the computer systems too easy to game. Of course, if you know the elements of an A essay and are able to combine them, you’re probably already a pretty good writer.
And the idea that it’s easier to prepare to ace a computer-graded essay exam than a human-graded one isn’t necessarily true, says Elijah Mayfield, a computer science Ph.D. student at Carnegie Mellon’s Language Technologies Institute who helped develop an open-source essay grading program.
“I think that’s something that could be said of a lot of standardized curricula,” he says.
Identifying the Next Great American Novelist
Mark Shermis heads up the University of Akron College of Education. Earlier this year, he co-authored a study of nine different essay-grading computer programs. Shermis says that on shorter writing assignments the computer programs matched grades from real live humans up to 85 percent of the time. But on longer, more complicated responses, the technology didn’t do quite as well.
Shermis says the technology will not identify the next great American novelist. “But if what you’re trying to do is communicate thoughts and ideas in a very straightforward manner, then the technology is actually a wonderful tool,” he says.
But it’s not the perfect tool for every situation, he says.
Shermis ran the Gettysburg Address through an earlier computer grading program, one usually used to evaluate the writing abilities of college freshmen.
Sad to say, Abe did not ace that test.
“On a scale of one to six, one of the greatest presidents of the United States was only getting two’s and three’s,” Shermis says. “We were actually very shocked. So shocked that we ran over to the history department, and one of the history profs said, ‘Look, it was an ok speech but it was really more famous for where it was given and why it was given than for the words themselves. Oh, and by the way, no one says four score and seven years any more.’”
Still, the promise of scoring thousands of student essays in a second or two without hiring human graders has caught the attention of school officials trying to cut expenses. It’s also caught the attention of the people designing the next generation of standardized tests that will roll out in most states next year.
By 2014, Ohio and most other states will move all their state tests online as part of the transition to a new shared curriculum called the Common Core. The Common Core requires students to do more information analysis than regurgitation. The groups developing the new, Common Core tests want them to assess students’ writing and reasoning abilities, not just their ability to color in the circles on a multiple-choice answer sheet.
Some advocacy and research groups say partially replacing human graders with computer grading programs could help make those tests cheaper, and get results back to students and teachers more quickly.
Mayfield, the developer of the open-source essay-grading program, says that’s understandable.
“The idea of standardized testing in general hits a nerve, the idea that an essay is given a score from 1-4 and that decides how good it is,” he says. “The fact that a machine learning algorithm can do it is great, but it really shouldn’t be the end of the discussion.”
He says machine essay graders might be best used now as a “first pass.”
“It’s going to allow more time for the people that actively can intervene to help where it’s needed without using up all of their time,” he says.
Those people are often called teachers.
Meanwhile, Back in the Classroom
Jeff Pence is already sold on the idea. Pence teaches writing to seventh graders in a Georgia middle school.
In Pence’s class, he starts by giving students an essay assignment. The students write out the first draft of their essays by hand and then type them into a program sold by education company Pearson in class. After they hit submit, the computer tells them how they can improve their writing.
Then students rewrite their essays and resubmit. They can rewrite and resubmit as many times as Pence allows.
With computer graders, the students get feedback on every draft instantly. Pence says there’s no way he and his red pen could do that.
Quicker responses lead to more writing, and that’s a good thing for teaching students how to write, Pence says, because “the quantity drives the quality up. It’s kind of like that old bicycle thing, the best way to learn how to ride a bicycle is to ride a bicycle and the best way to get better at writing is to write and receive consistent timely feedback.”
—Jeff Pence, Teacher, Cherokee County Schools
In fact, students often focus on improving and resubmitting their essays, says Mariane Doyle, a career-tech program administrator in a California school district that uses computer essay grading program.
“They get into the idea of competing with the computer to get that next highest score,” she says. ”It has that videogame appeal.”
Pence, the Georgia teacher, says it would be great to have a couple dozen real, live human teachers reading every student draft. It would also be nice if his district found the money to hire those extra teachers. Until then, he’ll keep his computers.
Pence says he knows the computer programs can’t inspire students to, say, expound upon soul-baring revelations found in the Catcher in the Rye. He says, that’s his job as a teacher.
But Les Perelman, the MIT instructor, says the real problem isn’t replacing human teachers with computers. It’s that the human graders for most standardized writing tests take about two minutes per essay. That’s really not much better than having a computer grade them. He says, what you want is real, live human teachers taking their time with each student’s work.
Pence has heard that argument before. And he doesn’t necessarily disagree.
“My response is, ‘Okay, I hear you.’ The ideal situation would be to have human graders come in to take my essays every day and give personalized feedback to every student. As soon as those people can come in and do that for thirty students at a time in about five seconds I’m on board.”
Until then, he’ll stick with his computer graders.