Eye on Education

A Video Guide to Ohio’s New Way of Evaluating Teachers

How can you tell if a teacher is doing a good job? Ohio has begun answering that question using a statistical measure called value-added.

It’s based on student test scores.

How does it work? Check out our video below. And if you want to learn more about value-added, including ratings of 4,200 Ohio teachers, check out the StateImpact Ohio/Cleveland Plain Dealer series, “Grading the Teachers.”


  • joey_of_cleveland

    This video is a good start. However, it drastically oversimplifies the concerns that Ohio’s educational community has about this evaluation system. Perhaps a follow up video with concerns and questions about the system as well as interviews of educators familiar with value added measurements would provide a more balanced account of the system. Below is just a few of the unanswered, and primarily ignored questions, educators have been asking since this mandate was first included in the 2011 budget.

    How does the value-added system account for validity and reliability issues from one year’s test to the next? For instance, the content of a 5th grade test is different from the 6th grade content. Scientifically speaking, comparing those two tests for value added measures presents a litany of validity problems.

    Furthermore, how does the state intend to deal with inequity of evaluation within the profession? Specialty teachers who only have students for one quarter of the year or special educators can’t be evaluated by these types of systems but have just as important a role in student education. They can not reliably be evaluated by value added measures.

    Does the state intend to compensate districts for the additional costs of implementing these evaluations?

    Its clear that principal only observation is not perfect, but perhaps the profession “settled” on the current system because there wasn’t anything better. Does anyone have a good account of how the current system came to be?

    • JMACK

      I’m not a blind advocate of value added, but it is a useful indicator, and a helpful measure. Of course it isn’t perfect, but more information and evidence can only strengthen evaluation systems and make them as fair as possible for teachers and the students they teach.

      Your first question, similar to the question above – about comparing 5th to 6th grade tests for example, is missing the point. 5th grade and 6th grade test scores cannot and will not be compared, what will be compared is growth between the two measures. What these tests to is take a measurement of the class and then at a different point in time take another measure and look at the changes between them. Now both of these measures are using different instruments, but they are comparable across teachers, because the measures are the same for everyone – if a test is “easy” or “hard” then it is easy or hard for everyone, and not penalizing anyone in particular.

      Principal observations are not as cheap as you think – I haven’t seen anything published, but I did a back of the envelope calculation and landed on around $150 for each observation, or $6 per child. This assumes your average principal in an urban district makes $120K and is on a 12 month calendar, so their hourly rate is about $60 ($120k/2k). I estimate the average classroom observation cycle takes a principal 2 hours in order to conduct the pre-conference, observation, post-conference, and log the scores, so that’s $120. If you add on that the recurring costs of training teachers on the rubrics and training principals to score reliably as well as costs of software and proprietary tools I suspect you are around $150. So multiply this for each observation in a year and the costs will mount.

      We need multiple measures to help us understand performance, not just observations, not just value add, there are multiple other measures we can use – and remember, the majority of teachers and classes do not take standardized tests and so cannot be included in this system. We need to open up and think outside of the box – we need to engage in doing what is best for the students in our schools, and holding each other, all of us, accountable for improving outcomes for the kids we work with everyday.

      • duckmonkeyman

        How can anyone endorse a model like value-add when no one other than SAS has access to the implementation? Trust but verify.

        It makes no sense to base value add on two entirely different measures. For example, one year focuses on algebra 1 skills and the following year on geometry skills. Reading what little the ODE documents reveal shows major assumptions trying to account for these disparate measures as well as classroom turnover, population, and others factors. Too many assumptions, and the model becomes too diluted. Too few assumptions, and the model moves further from reality. Also google “Campbell’s Law”.

        You do not sound like teacher being evaluated and more of a plant. Or you do not see the manor flaws in the approach. Your implicit condemnation that teachers NOW do not hold themselves personally accountable or cannot “think outside the box” or do not strive for what is best for students is grossly inaccurate and unfair to all who works in the classroom.

  • Brooke

    As a 7th grade teacher in Ohio, I can tell you that value-added doesn’t solve the problem of grading teachers fairly. The 6th and 7th grade tests, for example, are made by 2 entirely different companies. The 6th grade test is extremely easy and generally speaking, students perform very well on the 6th grade test. The 7th grade test is much more difficult and students often score poorly on this test compared to 6th grade. How can you grade my performance on that? In addition to the disparity in test difficulty/ease is the issue of the students themselves. I had a student who had failed 7th grade already and was in for another round. He didn’t care about anything and didn’t try in school no matter how much we called home or tried to help him succeed. He fell asleep during the test. I got scores back today and he failed miserably. I can tell you based on the times that he did participate in class that this student shouldn’t have scored that low – he’s not gifted, but he’s not a low ability student. That’s just one student. There are many others with similar stories. I teach in an urban district where parent involvement is minimal amongst many of our students. I have a master’s degree in my subject area and I work very hard every year to better myself as an educator to hone my craft. Student test scores often do not reflect the hard work that I put into my job.

    • duckmonkeyman

      That same 6-7 grade leap in skills is found in Common Core. The elementary standards are slow and plodding emphasizing basic skills. The 7th grade expectations are much higher and will be a shock to many students and parents. At a difficult time developmentally as well for students.

  • Tracy

    So, teachers in Ohio are only female?

  • Laura H. Chapman

    This system of grading teachers reflects the ignorance of the economists and statisticians and “workforce managers” who have marketed this system of teacher evaluation to federal and state policy makers. In Ohio, the statistical calculations for so-called value-added measures are proprietary, in a black box, owned by SAS. The assumptions in value added calculations are a fiction when applied to schools and teaching and learning. The formulas assume that students are randomly assigned to classrooms and teachers, that scores on tests from one grade to the next form an interval scale (like a thermometer, or measuring cup), that differences in grade-level content and skills does not matter–math scores on tests in the 4th, 5th, and 6th grade all have the same properties and meaning. Reputable statisticians know that these calculations produce different ratings for teachers depending on the statistical assumptions (which are not transparent in Ohio). Scores are influenced by tests (e.g., fill in the bubble versus solve a problem) . Scores differ by the subject taught (math versus English), by grade level, and by the makeup of a teacher’s classes (size, number of students learning English, number of students in special education and so on), and by a large number of other factors including resources for teaching, student mobility, stability of faculty and administrator leadership. And, the general SES of the community. The statistical measures assume that part performances on tests predict future performance in a linear trend line. No corporate report would dare say that “past performance predicts future performance.” No teacher would dare treat students as if their past performance predicted their future performance, because that stance could become a case of the self-fulfilling prophesy. Treat kid who do poorly on tests as failures, and you can ensure their failure. Ohio, like Florida, is ripe for massive class-action lawsuits from teachers who will be terminated or labeled ineffective from this legislated farce. This system of evaluation has been properly labeled “junk science,” a case of “mathematical intimidation,” and an example of the reification of scores on standardized tests as if these were perfected measures and the gold standard for judging the education our students are receiving. Farce. Big and tragic farce.

  • ccc

    This testing protocol is out of hand. I work as an engineer and have a strong math background. I would never compare two tests that are made by different companies. In addition, they cover different topics or expanded topics considering that a whole year of education has passed. That makes no sense and it probably gives completely random results unless you go to the extremes. Probably the good students always show equal or improved and the bad students probably hover around bad. The students in the middle are probably wild cards depending on a lot of inputs. Their home situation could be bad, they broke up with their girlfriend that day, they got into a fight, whatever. Kids are not robots. They need safe environments to learn. They need to be taught to understand the world and their environment. How to cope and solve problems. They need to understand how to be good well educated citizens in addition to learning math, science, English, art, etc. My company isn’t going to hire you because you can take a test well. We need people to be able to think and reason and solve vague problems with limited and sometimes risky options. Let’s focus on that.
    As for teacher evaluations, why should it be different for teachers. I get evaluations for promotions and raises where I work. They throw as many statistics on my evaluation as they can find. But in the end, it doesn’t make a difference. My evaluation is completely based on my experiences with all my evaluators. It’s their perception of what I did. Maybe I was late on everything I did. And that would be a measureable statistic. But what I accomplished was immense(not measurable in my field), so all the evaluators recognize the impact to the company and the project and I am rewarded for it.
    Education is a difficult thing to measure. Teachers and students shouldn’t be reduced to a test. I would rather not spend all this money on tests. I didn’t have any of that garbage when I went through school. But I had a lot of teachers that worked to make sure that I was a good problem solver, I could come up with creative solutions, that I challenged the information that I was given. It has served me well. Thank you teachers.

    • slammed

      Reality Check: I searched this blog to try and find information about the 7th grade test and value-added. I must say that your conclusion above is flawed. I work in an affluent district. Our seventh grade scores are the highest in the county: overall and for levels of advanced students. Our scores represent the second lowest in the county for limited readers. Our score is an F. I will write that again in case you missed the point. Our score is an F. 98% of our students are advanced readers. 84% of my students personally are advanced or accelerated. I had one student out of 110 students not pass the 2014 OAA test. I had only seventeen score proficient. As a result of the value added formula, I have also most likely received an ineffective rating. An urban 7th grade middle school in the same county has received a grade of C: 66% above proficient; only 8.7% advanced; only 16.7% accelerated; 13.9% limited. Now, I am not implying that the teachers in the urban setting are not effective. I have worked in an urban setting and I know how hard teachers work there and everywhere else. Urban settings deal with an entirely different set of variables than affluent suburban schools. The point I am making is that value-added is not about high performing students, teachers, or schools. It is about something else entirely, but as so many other educators have pointed out, the formula is protected, no one has access to it, no one has validated it, no one can question its results (one way or another). Furthermore, I would like to find the definition of cognitive growth. I have yet to find anyone who can tangibly show me a year’s worth of growth. One last point to add to the invalidity of this process, according to an ODE meeting I attended, each grade level is compared to a completely different base score each year that was established in 2010. The sixth graders are compared to the 2010 6th graders. The seventh graders are compared to the 2010 7th graders. The eight graders are compared to the 2010 8th graders. In order to show growth, the students must continue to score at the same level each year. In order to show advanced growth, the student must score above the level of the previous year. Now what if the 6th grade in 2010 year scored low and the 7th grade in 2010 scored really high? In 6th grade, you might be bigger than the average apple in 2010, but in 7th grade, you might be smaller than the average apple from 2010. I am not a statistician so I would like some clarification of how being compared to a different base group each year and having a test produced by different companies each year and not designed to measure growth can conclusively determine a teachers’ performance. Again I am not a statistician, so some of my thinking is probably off base, but I am open to criticism so that I can better understand my current situation.

  • duckmonkeyman

    Pretty one sided video. But StateImpact has become a propoganda channel for the “reform” movement. Several issues are not discussed or fairly presented:

    Teachers do not object to being evaluated on student performance. The question is whether one a year two hour test that is inherently flawed and biased is accurate and if 50% is reasonable. A good analogy is judging the best doctors by measuring patients’ waistlines.

    Value add is based on a multilevel statistical model that is kept secret. StateImpact falsely represents that they understand how the model works when no one has access to peer review or dissect the model. And every statistical model is only as good as its assumptions and “adjustments”.

    Teachers also object to the overemphasis on testing and the “one size fits all” approach. Teach to the test kills love of learning. Innovation is squelched. The best teachers become frustrated and leave.

    Ohio needs to stop being lemmings and realize the “reform” movement has little to do with learning and much to do with corporate interests during the public dollars on education.

About StateImpact

StateImpact seeks to inform and engage local communities with broadcast and online news focused on how state government decisions affect your lives.
Learn More »