Davidson, A Short History of Standardised Tests

Cathy Davidson discusses the early history of standardised testing.

So where did standardized testing come from anyway? That’s not just a rhetorical question. There is a “father” of the multiple-choice test, someone who actually sat down and wrote the first one. His name was Frederic J. Kelly, and he devised it in 1914. It’s pretty shocking that if someone gave it to you today, the first multiple-choice test would seem quite familiar, at least in form. It has changed so little in the last eight or nine decades that you might not even notice the test was an antique until you realized that, in content, it addressed virtually nothing about the world since the invention of the radio.

Born in 1880 in the small farming town of Wymore, Nebraska, Kelly lived until 1959. A lifelong education, he had seen, by the time of his death, the multiple- choice test adapted to every imaginable use, although it was not yet elevated into national educational policy, the sole metric for assessing what kids were learning in school, how well teachers were teaching them, and whether schools were or were not failing.

Kelly began his career at Emporia State University (formerly- Kansas Stare Teachers’ College). In 1914, he finished his doctoral dissertation at Teachers College, entitled Teachers’ Marks, Their Variability and Standardization. His thesis argued two main points. First, he was concerned about the significant degree of subjective judgment in how teachers mark papers. Second, he thought marking takes too much of a teacher s time. He advocated solving the first problem—“variability”—with the solution of standardization, which would also solve the second problem by allowing for a fast, efficient method of marking. …

To make the tests both objective as measures and efficient administratively, Kelly insisted that questions had to be devised that admitted no ambiguity whatsoever. There had to be wholly right or wholly wrong answers, with no variable interpretations. The format will be familiar to any reader: “Below are given the names of four animals. Draw a line around the name of each animal that is useful on the farm: cow tiger rat wolf.”

The instructions continue: “The exercise tells us to draw a line around the word cow. No other answer is right. Even if a line is drawn under the word cow, the exercise is wrong, and nothing counts. .. . Stop at once when time called. Do not open the papers until told, so that all may begin at the same time.”

Here are the roots of today’s standards-based education reform, solidly preparing youth for the machine age. No one could deny the test’s efficiency, and efficiency was important in the first decades of the twentieth century, when public schools exploded demographically, increasing from about five hundred in 1880 to ten thousand by 1910, and when the number of students in secondary education increased more than tenfold. Yet even still, many educators objected that Kelly’s test was so focused on lower-order thinking that it missed all other forms of complex, rational, logical thinking entirely. They protested that essays, by then a long-established form of examination, were an exalted form of knowledge, while multiple-choice tests were a debased one. While essay tests focused relationships, connections, structures, organization, and logic, multiple-choice exams rewarded memorization rather than logic, facts without context, and details disconnected from analysis. While essays allowed for creativity, rhetorical flourishes, and other examples of individual style, the Silent Reading Test insisted on timed uniformity, giving the most correct answers within a specific time. While essays stressed coherent thinking, the Silent Reading Test demanded right answers and divided knowledge into discrete bits of information. While essays prized individuality and even idiosyncrasy, the bywords of the Silent Reading exam were uniformity and impersonality….

In 1904, two French psychologists, Alfred Binet and Theodore Smion, were commissioned by the Ministry of Public Education to develop tests to identify and diagnose children who were having difficulties mastering the French academic curriculum. They were using the word intelligence in the older sense of “understanding” and were interested in charting a child’s progress over time, rather than positing biological, inherited, or natural mental characteristics.

Like Kelly in Kansas, who began his testing research around the same year, the French psychologists were of their historical moment in seeking efficient and standardized forms of assessment: “How will it be possible to keep a record of the intelligence in the pupils who are treated and instructed in a school.. . if the terms applied to them—feeble minded, retarded, imbecile, idiot—vary in meaning according to the doctor who examined them?” Their further rationale for standardized testing was that neurologists alone had proved to be incapable of telling which “sixteen out of twenty” students were the top or the bottom students without receiving feedback from the children’s teachers.

Note the odd assumption here that you need a test to confirm what the teacher already knows. The tests were important because the neurologists kept getting it wrong, merely by plying their trade and using their scientific methods. Historian Mark Garrison asks, “If one knows who the top and bottom students are, who cares if the neurologists can tell?” He wonders if the importance of the tests was to confirm the intelligence of the students or to assert the validity of institutional judgment, educational assessment, and scientific practice of the day.

He would have been appalled and disgusted by the misuse of his test that began in 1917. In a story that has been told many times, the president of the American Psychological Association, Robert Yerkes, convinced the military to give the new Stanford-Binet IQ tests to more than a million recruits to determine who had sufficient brain power to serve as officers, who was fit to serve overseas, and who simply was not intelligent enough to fight in World War I at all. This massive sampling was then later used to “prove” the mental inferiority of Jews, Italians, eastern Europeans, the Irish, and just about any newly arrived immigrant group, as well as African Americans. Native-born American, English-speaking Anglo-Saxons turned out to have the highest IQ scores. They were innately the most intelligent people. Familiarity with English or with the content was irrelevant. There were Alpha tests for those who could read and Army Beta tests for those who couldn’t, and they were confusing enough that no one did well on them, raising the eugenic alarm that the “decline” in American intelligence was due to racial and immigrant mixing. Almost immediately, in other words, the tests were used as the scientific basis for determining not just individual but group intelligence. The tests were also adopted by U.S. immigration officials, with profound impact on the Immigration Restriction Act of 1924 and other exclusionary policies. They were also used against those of African descent, including native-born African Americans, to justify decades of legal segregation, and against Asians and Native Americans as the basis for inequitable citizenship laws. Of course, the argument had to run in the opposite direction when applied to women who, from the beginning, showed no statistical difference from men in IQ testing. Magically, the same tests that “proved” that different races had unequal innate intellectual capacities nevertheless were not held to “prove” that different genders were equal.

Davidson, Cathy. 2011. Now You See It: How the Brain Science of Attention Will Transform the Way We Live, Work, and Learn. New York: Viking. pp.113-122. || Amazon || WorldCat