A computer program will grade student essays on the writing portion of the standardized test set to replace the FCAT, according to bid documents released by the Florida Department of Education.
The essays will be scored by a human and a computer, but the computer score will only matter if the score is significantly different from that of the human reviewer. If that happens, the documents indicate the essay will be scored by another human reviewer.
Florida writing tests are currently graded by two human scorers and the state has never used computerized grading on the exam.
The Florida Department of Education announced Monday it chose the non-profit American Institutes for Research to produce new tests tied to Florida’s Common Core-based math and language arts standards. Spokesmen for the agency and AIR said they had yet to sign a contract, were still working out the details and declined to comment about the specifics of the new test.
“It’s speculative at this point to think about what is on the assessments,” said Joe Follick, communications director for the Florida Department of Education.
But the bid documents show using computers to grade the state writing test will save $30.5 million over the course of the six-year, $220 million contract with AIR. The change was part of a list which trimmed more than $100 million from AIR’s initial proposal.
The documents also indicate Florida will license its test items from Utah in 2015, the first year the new Florida test will be given. AIR will create Florida-specific questions by the time the test is administered in 2016, saving $20.4 million in licensing fees.
Florida would also save another $14.5 million by limiting the number of pencil and paper tests in favor of online exams. The documents call for just 2 percent of tests to be delivered by pencil and paper the first two years, and 1 percent in future years.
That would put more pressure on school districts to ensure they have the bandwidth and computers necessary to administer the new test.
And Florida will eliminate all paper reporting of test results, saving $14 million.
The use of computer-graded essays may become a necessity, said University of Akron researcher Mark Shermis, because Common Core-tied exams will expand the number of students taking writing exams each year.
Currently, Florida students in grades four, eight and ten take the FCAT writing exam. Under Common Core, students take a writing exam every year.
Florida and 44 other states have fully adopted the Common Core. The standards outline what students should know at the end of each grade.
“Even if you had the money,” Shermis said, “you wouldn’t have the people to do the vast amount of grading required under the Common Core State Standards.”
Shermis found computer programs — including AIR’s AutoScore– performed at least as well as human grading in two of three trials that have been conducted. His research concluded computers were reliable enough to be used as a second reviewer for high stakes tests.
But while the technology is improving, Shermis said districts need to study whether computer-graded essays put any class of students at a disadvantage.
Other researchers are less bullish on the technology.
“Of the 12 errors noted in one essay, 11 were incorrect,” Les Perlman of the Massachusetts Institute of Technology told our colleagues at StateImpact Ohio in 2012. “There were a few places where I intentionally put in some comma errors and it didn’t notice them. In other words, it doesn’t work very well.”
Many states and the two multi-state consortia developing Common Core-tied tests said they are watching computerized essay grading.
Utah has used computer essay grading since 2010, said Utah Department of Education spokesman Mark Peterson. The state trusts the technology enough that computers provide the primary scoring for the state’s writing exams. Peterson said state reviews have found fewer biases in computer grading than human grading.
Utah uses Measurement Incorporated technology to grade essays and will switch to AIR when the current contract runs out, Peterson said.
Smarter Balanced spokesman Jackie King said the test would use only human grading on the writing portion, but that the technology is promising. Officials with the Partnership for Assessment of Readiness for College and Careers, or PARCC, said they have not yet made a decision about the use of computerized grading.
You can read Shermis’ paper on the research below: