Popham on the Types of Test

Educational measurement specialist W. James Popham discusses norm-referenced and criterion-referenced tests.

There are two rather distinctive, yet widely used, assessment strategies available to educators these days: norm-referenced measurement and criterion-referenced measurement. The most fundamental difference between these two approaches to educational assessment is the nature of the interpretation that’s used to make sense out of students’ test performances.

With norm-referenced measurement, educators interpret a student’s performance in relation to the performances of students who have previously taken the same examination. This previous group of test takers is referred to as the norm, group. Thus, when educators try to make sense out of a student’s tests score by “referencing” the score back to the norm group’s performances, it is apparent why these sorts of interpretations are characterized as norm referenced.

To illustrate, when a teacher asserts a student “scored at the 90th percentile on a scholastic aptitude test,” the teacher means the student’s test performance has exceeded the performance of 90% of students in the test’s norm group. In education, norm-referenced interpretations are most frequently encountered when reporting students’ results on academic aptitude tests, such as the SAT, or on widely used standardized achievement tests, such as the Iowa Tests of Basic Skills or the California Achievement Tests. In short, norm-referenced interpretations are relative interpretations of students’ performances because such interpretations focus on how a given student’s performance stacks up in relation to the previous performances of other students.

In contrast, a criterion-referenced interpretation is an absolute interpretation because it hinges on the extent to which the criterion (that is, curricular aim) represented by the test is actually mastered by the student. Once the nature of an assessed curricular aim is properly described, the student’s test performance can be interpreted according to the degree to which the curricular aim has been mastered. For instance, instead of a norm-referenced interpretation such as the student “scored better than 85% of the students in the norm group,” a criterion-referenced interpretation might be the student “mastered 85% of the test’s content and can be inferred, therefore, to have mastered 85% of the curricular aim’s skills and/or knowledge represented by the test.” Note that a criterion-referenced interpretation doesn’t depend at all on how other students performed on the test. The focus is on the curricular aim represented by the test.

As you can see, the meaningfulness of criterion-referenced interpretations is directly linked to the clarity with which the assessed curricular aim is delineated. Clearly described curricular aims can yield crisp, understandable criterion-referenced interpretations. Ambiguously defined curricular aims are certain to yield fuzzy criterion-referenced interpretations of little utility.

Although loads of educators refer to “criterion-referenced tests” and “norm-referenced tests,” there are, technically, no such creatures. Rather, there are criterion- and norm-referenced interpretations of students’ test performances. For example, educators in a school district might have built a test to yield criterion-referenced interpretations, used the test for several years and, in the process, gathered substantial data regarding the performances of district students. As a consequence, the district’s educators could build normative tables permitting norm-referenced interpretations of the test which, although born to provide criterion-referenced inferences, can still permit meaningful norm-referenced interpretations.

Because most of the assessment-influenced decisions faced by classroom teachers are benefited by the teacher’s understanding of what it is students can and cannot do, not merely their relative standing in relationship to one another, classroom teachers will generally want to arrive at criterion-referenced rather than norm-referenced interpretations. A teacher’s instructional choices, for example, are better serviced by evidence about particular students’ skills and knowledge than by evidence about how those students compare with one another.

About the only instance in which classroom teachers will need to employ norm-referenced interpretations occurs when there are fixed-quota settings—that is, when there are insufficient slots for all of the teacher’s students to be assigned to a given educational experience. For example, suppose there were five students to be chosen for a special outside-of-class enrichment program in mathematics, and the teacher had been directed by the school principal to choose the “best” mathematics students for the program. In such a situation, because best is, by definition, a relative descriptor, norm-referenced interpretations would be warranted. A teacher might build a test to spread students out according to their mathematics skills so the top students could be identified. Barring those rare situations, however, most classroom teachers would be better off using tests yielding criterion-referenced interpretations. Criterion-referenced interpretations provide a far more lucid idea of what it is students can and can’t do. Teachers need such clarity if they’re going to make solid instructional decisions.

Popham, W. James. 2011. Classroom Assessment: What Teachers Need to Know (Sixth Edition). Boston: Pearson Education, Inc. pp. 46-8. || Amazon || WorldCat