Putting Education Reform To The Test

Inside the Mathematical Equation for Teacher Merit Pay

Miami Herald reporter Laura Isensee contributed to this report. Read her story on Florida’s merit pay formula here. 

School has always been about grading students. But now 24 states are starting to grade teachers.

Florida is using a mathematical formula to calculate how well teachers are doing their jobs. The grade it spits out will help determine how much a teacher gets paid and whether that teacher can keep his or her job.

But the formula is so complex even an advanced calculus teacher and former college math major can’t understand how it works.


Advanced calculus high school teacher, Orlando Sarduy, writes out the formula that will grade and help determine the pay of Florida teachers. Even for a college math major like him, the formula is too confusing to understand. He calls it a "mathematical experiment."

Coral Reef High School teacher Orlando Sarduy says just reading the formula is difficult for him.

StateImpact Florida and the Miami Herald partnered up to deconstruct the equation and try to figure out what’s going on here. We asked statisticians and policymakers how the formula works. The answer we got: No lay person, teacher or reporter can understand it. So just trust us.

“I would really challenge any sort of decision maker to look at [the formula] and explain it,” Sarduy said. “I understand just the basics, but this is really the technical nitty-gritty of what’s going on, and to me it looks the same as it would to a lay person, like ‘what’s going on here?” 

How The Formula Works 

The formula is designed to predict how students will score on the state’s standardized exam—the FCAT. And then it adjusts teachers’ pay depending on how well their students measure up against that predicted score.

The formula takes into account school and student characteristics that Florida officials say predict how well a student is going to do in school.

“I would really challenge any sort of decision maker to look at [the formula] and explain it.”

- Orlando Sarduy, advanced calculus teacher

Those officials decided there are only 10 factors that matter. They chose things like the number of students in a classroom, whether English is a student’s first language, attendance rates and disability status.

The statisticians created a formula that gives each of the 10 factors a certain weight.

That’s how the statisticians and policymakers who created the formula explained it to Miami Herald education writer Laura Isensee.

For example: If a student misses 5 days of school, the statisticians determine what the effect of missing 5 days of school will have on that student’s standardized test score.

The statisticians do this for each factor and every student. In the end, the formula predicts what a student’s test score should be given all these factors.

“And that prediction will be the grading stick for the teacher,” Isensee said. “If the student gets higher than the predicted score, the state thinks they must have a good teacher. If a student scores below the predicted score, then the teacher could be in trouble.”

The state has a list of all the different weights for all the factors.

But Isensee said, “The weights are all over the place, even for kids who seem to be in the same situation.”

Here’s an example. Warning, difficult math ahead.

The weight for an English language learner in reading class is -7.3 if you’re in sixth grade, but it +12.9 if you’re in tenth grade.

Isensee says the formula requires a lot of trust.

“I tried to understand why the impact is so different for everyone and the statisticians basically told me, ‘Don’t worry about it, that’s the formula’s job. The formula knows how much weight to give everything.’

“The teachers have to trust that the policy makers chose the right factors, and the policymakers have to trust that the statisticians came up with an accurate formula.”

Kathy Hebda is with the Florida Department of Education. She says the formula Florida created is a state of the art model.

“We have contracted with leading national experts,” Hebda said. “We have a statewide committee that is steering the process and making recommendations to the commissioner about the model, that’s made up primarily of teacher. That kind of input and guidance is extremely important to make sure that the model works properly.

“We’re very confident in the process and the approach we’ve taken,” Hebda said.

Teachers like Sarduy are skeptical. He questions why Florida only chose 10 factors to begin with. He says there could be hundreds of factors that impact how well a student does in school.

“[The formula is] only as good as the variables that you’re actually looking out for as well as the test that you’re using to measure,” Sarduy said. “At the backbone of this is still an exam that’s made up. If the exam is invalid, the whole equation is invalid.”

The most influential factor in the equation is the score a student gets on the FCAT exam. It’s the factor with the most predictive value.

What Happens If You Don’t Teach an FCAT Subject?

In Florida about 60% of teachers do not teach a subject that is tested by the FCAT, like physical education teachers, health and history teachers, or chemistry and advanced calculus teachers.

“Health teachers, advanced calculus teachers – their pay will be based on how well kids read.”

- Laura Isensee, Miami Herald education writer

Insensee says until the state comes up with a test for every subject in every grade, teachers who don’t teach an FCAT subject are going to be graded on the whole school’s reading score.

“So heath teachers, advanced calculus teachers, their pay will be based on how well kids read,” said Isensee.

That has teachers like Sarduy pretty frustrated.

“It’s infuriating,” Sarduy said. “I have nothing to do with whatever that end result is.”

But the state does give teachers another chance to show off their teaching skills. Hebda says the grade the formula gives teachers is only half of the whole teacher evaluation process.

“The law is very clear that that’s 50% of an evaluation. The other piece of that is equally important, which is instructional practice,” Hebda said.

“What is the teacher actively doing in the class? What are the students actively doing in the class? And then what are the student outcomes? Those are the things that all go into making that final evaluation result.”

So principals and other educators will still do their own evaluation of teachers by sitting in their classroom and reviewing lesson plans.

But some teachers still feel like too much of the end result is out of their control.

The State Says Poverty Doesn’t Matter

Study after study we hear that poverty is the number one indicator of how well students do in school. But Florida policymakers made it against the law to include any socioeconomic status as a factor in the formula.

There’s no factor for poverty, homelessness, immigration status, race or ethnicity.

The state says the formula does not need to include a student’s socioeconomic status as its own factor in the formula because its already baked into the equation. Teachers would only be graded on how well they help poor students improve from the year before when they were also poor.

The rationale is that all kids, regardless of whether they’re homeless or poor, can improve at the same rate as kids from wealthy areas, if they have a good teacher.

“I like the fact that its attempting to isolate just the teacher’s [role],” Sarduy said. “But you have to realize, that’s me on the line. I’m now part of a statewide experiment.”

UPDATE: Here’s the formula in its entirety.



  • Scubus

    At the heart of the formula are assumptions, and then the formula is built on top of those assumptions.

    You know how hurricane models vary in their predictions, and are not often in agreement or 100% accurate?  That is a similar mathematical model.  They are only as good as the underlying assumptions.

    In addition, study after study shows that children in poverty do not learn at the same rate as more affluent students (which has NOTHING to do with intelligence, just opportunity) no matter how much state officials want that to be true.  In addition, the model makes no allowances for attendance (I have a number of students every year who miss one quarter or more of my classes.)

    Finally, a single standardized test alone tells us very, very little about an individual student.  Students may have a great day or a poor day.  on a larger scale these variances may be balanced, but individually the cannot be adjusted for.  It is absolutely a misuse of a test.  Testing experts are in agreement on this point but legislators choose to ignore this for political reasons, not to improve schools or because they care about students.

    I do not, nor do many teachers, disagree with the goal – just the model.  If there were an accurate means to grade a teacher most would be onboard.  This isn’t it, and such a system has never existed.  And globally, school systems that are models of success know this and do not use testing in this manner for a reason.

    • Anonymous

       Thanks for the response, Scubus. I’m going to highlight these points in a separate post.

    • JP Finan

      I really want to embrace formulas like this. I hate the idea that 5% of teachers can muddle along doing a crap job for kids. I don’t have any trouble understanding the formula – sure, it’s convoluted, but it’s not rocket science. I’m a big believer in at least trying to reward good performance, too. I really do not care about hurting the feelings of a bad teacher, regardless of how many years they’ve been bad at teaching. I don’t worry much about year to year random differences as long as students are randomly assigned. Yes, sometimes you get a bad crop but other times the opposite is true. That said, there really seem to be a lot of problems to work out – poverty matters, for one, and testing instruments seem like they may need a lot of work (though my sample size is just 1!).
      As Scubus said, there are good arguments that poverty affects the rate at which a kid can learn, not just their absolute improvement. A lack of early language development (correlated with parent-child interaction, which is correlated with income) and a lack of practice at socialization (of the kind in pre-K education like sharing, getting along with different people, sitting still once in a  while, etc) reduces the rate at which a child is able to learn. The children of poor parents doing a perfect job in every way they possibly can are simply exposed to demonstrably more stress at home by the nature of poverty. More stress equals less improvement in percentage terms.
      Still, a model that looks at differences in performance over a school year could still be useful if it were used to compare teachers of students with similar backgrounds – say, form the same school or pooled with similar schools. The data need to be open to researchers for critique to find these issues.
      The quality of the testing instrument remains another real concern. I just reviewed Wisconsin’s tenth grade reading test (posted online for review based on 2005 questions) and I was surprised at how poor it was. Of the 18 posted questions, 2 were unacceptably ambiguous and 2 were just incorrect. In one case, the supposedly correct answer was actually the worst answer offered. I got a 36 on the reading portion of the ACT, btw, and a 98th percentile GRE score, both on my first try. I read well, and I wasn’t looking for problems. I was just curious. 14 out of 18 is just not going to cut it for an assessment that is used so to make such important decisions! The best students absolutely got worse scores on Wisconsin’s 2005 10th grade reading test than did poorer readers. That really sucks.
      One direction I’d love to see explored is looking at longer term assessments of teacher quality, perhaps for determining a sort of tenure or longer term bonus system. If you could somehow survey students farther down the line using a more accurate measure of success than poorly written tests, that could be golden. Maybe we could ask graduating students or college grads or 25 year olds about their overall life satisfaction, educational attainment, financial security, and whatnot. Then you could do a relatively simple factor analysis to see which teachers were associated with people who did better in school and on things that matter – success in life, not guessing the right answer on a test. It would take a while, but it seems like getting a handle on the performance of urban teachers may simply be a long and difficult process. A quality teacher in a school where most eighth graders are practically illiterate may not be the one who can teach the kids to fill in the right dots. Maybe the best teacher is one who serves as a strong male role model or who teaches teamwork and forgiveness or who shows the kids strategies for calming down and dealing with their emotions – you can’t see those results in a single year or with a test, but they may be measurable, regardless.

  • How do they factor in the death of a family pet? Or a hard year because a child’s parents separated and are getting divorced? 
    Don’t you think a formula for bankers and CEOs would be a little more straightforward? Why don’t they start there.

  • Public educator in AZ

    1) Sadury says one of the most important factors: the formula is based upon an assessment. Has as much attention been given to the testing insturment.

    2) Why not begin the formula as a pilot program, so that the kinks can be worked out? Tying such an assessment tool AND a complicated formula to the pay scale of an educator without deep working knowledge and application of the model is dangerous.

    3) Teachers in the classroom are becoming the SINGLE success factor. (I will not state that parents are the other, as this condems poor children to poverty.) The teaching profession in this country must be elevated, as it is in Finland, Singapore, Japan, etc. Better teachers will improve student learning. This formula is at the VERY end of the chain. 

    4) Finally, it is imperative to compare teachers’ and students’performance, year to year. What will be publicized is what is already known, widely in academic circles. Value-added models demonstrate that teachers scores are not the same, year to year. This points to the many, many variables which impact a classroom. 

    Public educator in AZ

  • Frazier

    I believe that the evaulative skills of administrators must also be a factor. A skilled evaluator can see how students respond to the teacher’s instruction as well as monitoring the actions of the teacher.

  • Hiawatha

    This is absolutely insane.   You cannot hold a teacher completely responsible for a student’s performance.   I would hate to have my pay and my job tied to the performance of teenagers.

    And you wonder why over 50% of new teachers leave teaching after 3 years….

  • sm

    There’s a predictable amount derision and, more reasonably, doubt that a single formula can measure teacher performance. Nevertheless, I’d seriously pursue it as a part of an overhaul of the teaching class. As a major part of society, American education (esp. math, science, reading/writing ) is no longer credible.

    Teachers complain they teach to a test thus undercutting, at least from a attitudinal standpoint, real education in that students do little more than parrot back memorized facts. Again there is some merit to this. But when located in the context in which all this began, it’s little more than a side show. 

    Standardized testing started in the 1980s when the “Nation at Risk” national report was released. It was a scandalous. Towards the end of the 1970s and into the early 80s, American educational outcomes were at an all time low. In other words with standardized testing teachers complain and education lacks. Without standardized testing teachers complain and education lacks.

    Since that time America has been looking for ways to enforce minimum levels of objective outcomes, have greater consequences for failing schools, and attach greater levels of reward for those schools that succeed. NCLB was supposed to help here. Standardized testing and serious teacher evaluations must also be part of that solution. I’m rated numerically and in writing twice a year. The higher I perform the more difficult it is to advance. I can get fired — people have been fired — for substandard performance. Why should teachers, unionized or not, be different? Teachers and principals are the single most important factor in schools.

    Some things I like to see are:- Reduce the influence of unions which protect bad teachers
    - Make teacher job security fair by attaching it to outcomes 
    - Greater pay for principals, science, English, mathematics teachers
    - Stop kids from matriculating from primary to middle, from middle to H.S.  and from graduating from H.S. regardless of age unless minimum proficiency
      is demonstrated.
    - Have public Universities set for H.S. the examinations students must  pass before they can graduate H.S… Have H.S. set for middle school  examinations students must pass before one graduates middle school etc..  Thus teachers cannot bake in their own success rate and will be forced to  treat the next institution like a customer and produce an outcome that  it, not themselves, values. No more remedial education!
    - Charter schools: public schools need to get the message: given the choice  between protecting teachers and children, children win.

    • ChapmanLH

      I think you are uninformed about the history of standardized testing. It began much earlier in the 20th century was accelerated by the need to screen soldiers for mental abilities in WWII. In the years following that war, those “experts” in testing helped to  plant into public schools the now popular view that training kids to score high on tests is the same as becoming well-educated. The Florida formula is designed to perform a triage on all teachers, irrespective of the subject taught. If you are hired to teach Physiscs, French, or Band (or heaven forbid Art), you will be graded and paid on how well students in your school read English. That is insane. You are also mistaken if you think more math and science is the source of all great things for our economy. Best example is Steve Jobs who crededit Apple’s success to his studies of calligraphy (a form of art) and his use of that knowledge a decade later in the first personal computer from Apple. At about the same time, Charles Csuri a professor of fine art at Ohio State pioneered the whole field of computer animation. The international standard for excellence in K-12 education is a balanced program of studies in the arts, sciences, and humanities–good enough for the sons and daughters of most of the politicians who are have beedn sold a false bill of goods that this exotic number-crunching will improve outcomes of education. Wishful thinking again.

  • Cuppy

    For all the scientific validity, they might as well call psychic friends network.

  • Tosca

    Why don’t we come up with a merit pay formula for parents? If you do a half way adequate job of parenting your child, i.e., get him/her to school each day; make sure they have glasses (the parents of 4 dyslexic students in our school haven’t bothered to get their vision impaired children glasses); take care of their basic needs and don’t abuse them–hey, you get a little merit pay. If you actually help your child do homework, you get even more! I bet if they were paid, you’d see some better parenting out of some of the worthless parents out there. And guess what? You might even have better test scores.

  • Desert Penguin

    Last year, my son, a first grade teacher, had 8 students with learning disabilities & 5 students with at least one parent who didn’t speak English. This year, he only has one LD student. A merit system that doesn’t take that into account would be inherently unfair.

    Second point; another teacher at his school pointed out that, instead of the current practice of sharing ideas and suggestions with other faculty members, at least some of the proposed merit systems would turn fellow teachers into competitors.

  • Anonymous

    You’ve got to be kidding me… “Teachers would only be graded on how well they help poor
    students improve from the year before when they were also poor.” Has
    every poor child always been poor? If this is the thinking behind this equation, we need lots more free adult education night classes in Tallahassee.

    Parents die or get divorced or lose a job, often in the middle of a
    school year. Students move, break a leg, or get mono, often in the
    middle of the school year. Students and their situations do not remain
    static from year to year. Does each “factor” assume that the student’s
    current situation has been unchanged since he/she started school?

    Just ten factors? A kid who
    wakes at 5:00AM to take the school bus probably sleeps an hour less each night than a
    kid who gets a ride in mom’s Benz. Is transportation a factor? Is computer ownership/internet access one of the “factors”??? How about prescription drugs? How about date of last menstrual period? Presence of siblings in the home? Pet ownership? Parent’s educational attainment? Parent’s rap sheet? Recency of hurricane strike? % uptime of school A/C? Single parent household? All these things influence performance.

    This is not a math problem. This is a faulty assumption problem. The assumption is that the man behind the curtain can actually know the ten factors that affect FCAT scores, that those factors are the same from student to student and year to year, and that those factors can be accurately determined for each public school student. Life just aint that simple.

    How many weeks between collection of the ten factors and the taking of the test? Does nothing change for any student during this time? The architects of this equation deserve an “F” grade. Shoddy logic and excessive hubris.

    • ChapmanLH

      I agree that this is junk statistics. Are you aware that the pioneer of this nonsense has made a fortune by taking his formulas developed for genetic engineering in agriculture into number crunching  fed by the standardized test score of kids in one county in Tennessee? 

      The assumptions made in
      value-added assessments stretch the limits of credibility. The data must come
      from reliable and comprehensive tests. Tests are crude proxies for achievement. The tests must measure the same
      underlying concepts at the beginning of a school year and at the end of the
      school year, and year-to-year tests must be essentially the same in reliability
      and validity. Large numbers of students must be randomly assigned to a large
      number of teachers. If not, then by other means, statisticians will figure out
      how to make teachers look like they are working with the same sort of students
      from one year to another. The statistical methods in use assume that students are only
      learning from one teacher at a time with no residual effects from other teachers,
      and more generally that education in school occurs under steady-state

      Value-added methods slice,
      dice, and repackage test scores of students—even students with incomplete
      histories of taking tests—into predictions of the scores students should receive over time, based their past performance. These predictions are
      estimates of an average year-to-year gain in scores—a percent cumulative norm
      gain—that effective teachers exceed, and that ineffective teachers do not.  No other country in the world is so preoccupied with test scores and damaging teachers with ill-founded claims about their competence. 

    • “Poor” as in “poor student”. It means poor skills. Not poor finances. Your reading teacher probably hates this formula, too.

  • Jason Hur

    Oh, hey. It’s Mr. Sarduy, my old math teacher. Gotta love that guy.

  • Paul Cheenis

    I believe Orlando has it all wrong. The variables may define the function, but I believe that at best, the function is defined by the variables. For example, take the function f(x,y,z)=x+y+z. This function is indeed defined by the variables that are defined by the user, but the variables themselves do not single-handedly define the function. In this equation for determining the salary of a teacher, the clearly obsolete variables that are included, mostly dispositional variables, partially define the function, even overburdening the variables that seem even relevant. If you’re a teacher, well, Hmph, I guess it sucks to be you!

  • Not too surprising he’s critical of a formula he admits he doesn’t understand. How good of a math teacher can he be?

About StateImpact

StateImpact seeks to inform and engage local communities with broadcast and online news focused on how state government decisions affect your lives.
Learn More »