By Jesse Marx
By Chris Parker
By Jake Rossen
By Jesse Marx
By Michelle LeBow
By Alleen Brown
By Maggie LaMaack
By CP Staff
One student wrote, "Martin Luther King Jr. was a good leader." With artfulness far beyond the student's age, the essay delved into King's history with the civil rights movement, pointing out the key moments that had shown his leadership.
There was just one problem: It didn't fit the rubric. The rubric liked a longer essay, with multiple sentences lauding key qualities of leadership such as "honesty" and "inspires people." This essay was incredibly concise, but got its point across. Nevertheless, the rubric said it was a 2. Puthoff knew it was a 2.
He hesitated the way he had been specifically trained not to. Then he hit, "3."
It didn't take long before a supervisor was in his face. He leaned down with a printout of the King essay.
"This really isn't a 3-style paper," the supervisor said.
Puthoff pointed out the smart use of examples and the exceptional prose. The supervisor just shook his head and pointed out how short the paragraphs were.
"You know, it's more of a 2," the supervisor repeated. "Not enough elaboration."
Puthoff quickly learned these were not arguments he could win. But as time went on, he found himself having more and more of them.
There were the students who wrote extremely well but whose responses were too short—in his mind he saw them, bored with the essay topic, hurrying to finish. Or the essays where the handwriting got rushed and jumbled at the end, then cut off abruptly—he imagined the proctor telling the frantic student to lay down his pencil on a well-written but incomplete response.
And there were the kids who just did what they wanted. Like the boy from Arkansas who, instead of writing about the most fun thing to do in his town, instead wrote a hilarious essay on why his town is terrible and how he wanted to burn it down and pee on the ashes.
"I wanted the kid to get the score they deserved," Puthoff says of his time in the business. "But they want to put them in boxes."
In defiance, Puthoff started giving creatively written essays an illicit score bump. His agreement numbers noticeably suffered.
The industry calls this "scorer drift," a well-documented tendency to begin deviating from the rubric over time. One case of scorer drift actually resulted in some 4,100 teachers failing the essay portion of their certification exams. The teachers successfully sued for $11.1 million.
What was different about Puthoff's scorer drift was that he was doing it on purpose.
"I'll bring them up, don't worry," he'd say of his agreement rate, then go back the next day and do the exact same thing.
"I know this kid is good," he'd tell himself. "I know this kid's a good writer."
TODD FARLEY TREATED his supervising position at a scoring company like a joke.
"At the time, testing wasn't that big," he says. "I never had to feel like I'm actually deciding someone's future. It was just silly."
Farley had started at the bottom rung of the testing industry in Iowa City. A part-time graduate student with bills to pay, he was more interested in partying and trying to become a writer than he was in getting a real job. So he took one scoring job after another for NCS.
"It was always a temporary gig," he remembers. "It was a lovely, slacker-y life."
Farley had no official training in teaching, education, or test writing, but the longer he remained at NCS, the more responsibilities he was handed. He took the offer to become a team leader because it paid a little extra money and got him out of scoring.
Teaching his first group of scorers, Farley walked them through the rubric the same way he'd been shown. He fielded the inevitable bombardment of confused questions as best he could, in particular from one man: Harry the laid-off refrigerator plant worker.
Even though Harry eventually passed his qualifying exams, he was a disaster. As Farley monitored Harry's scoring, he found himself walking back over to Harry repeatedly.
"Look," Farley would say. "You're giving this essay a 2 even though it's perfectly formatted."
Harry would nod. But a short time later, another ridiculous low ball from Harry would land on Farley's desk. Before long, Harry began to drag down the all-important agreement level.
Farley now understood the reasons why, when he'd been a scorer, his team leaders would tell the room he wanted to start seeing more 3s or 4s or whatever. Supervisors were expected to turn the test scores into a nice bell curve. If his room did not agree at least 80 percent of the time, the tests would be taken back and re-graded, wasting time and money. The supervisor would be put on probation or demoted.
When Farley complained to a fellow supervisor about his problem, she smiled wryly and held up a pencil.
"I've got this eraser, see," she told him. "I help them out."
So Farley simply began changing Harry's scores to agree with his peers'. The practice soon spread well beyond Harry.
"I'd just change a bunch of answers to make it look like my group was doing a great job," Farley says. "I wanted the stupid item to be done, and so did my bosses."