Clay on Observation Surveys


Marie Clay discusses the importance of observation in early literacy development.

If we attend to individual children as they work, and if we focus on the progressions in learning that occur over time, our detailed observations can provide feedback to our instruction. Carefully recorded observations can lead us to modify our instruction to meet the learning needs of particular children in the formative stages of new learning, like beginning reading, beginning writing and beginning mathematics.

Planned observations can capture evidence of early progress. All science is based on systematic observation of phenomena under known conditions. Physicists or chemists in laboratories, botanists and zoologists in the field, and behavioural scientists in psychology, sociology, linguistics and cultural anthropology all use observation to gather research data, but in each of those subjects the observation takes place under strictly controlled conditions. In the past it was not easy to convince teachers that observing individual children at work was a legitimate part of literacy teaching and assessment. Today, despite some lingering mistrust, direct observation in research about young learners is not only acceptable but has a complementary role to play alongside other research and assessment approaches. It is particularly useful up to eight or nine years of age (Genishi, 1982).

Educators have done a great deal of systematic testing and relatively little systematic observation of learning. One could argue that educators need to give most of their attention to the systematic observation of learners who are on the way to those final scores on tests. Systematic observations have four characteristics in common with good measurement instruments. They provide:

a standard task

a standard way of administering the task

ways of knowing when we can rely on our observations and make valid comparisons

a task that is like a real world task as a guarantee that the observations will relate to what the child is likely to do in the real world (for this establishes the validity of the observation).

Together, a standard task with standard administration and with standard scoring procedures provide sound measurement conditions. Otherwise we would be evaluating with a piece of stretchy measuring tape instead of using an instrument that behaves in the same way on every occasion. Two measurements with a stretchy tape cannot be compared; and comparability is often important not only at the national, state and district level but also at the individual level. Watching the progress of children we often want reliable ways to compare a student on two of his own performances. A standard task, which is administered and scored in a standard way, gives one kind of guarantee of reliability when we make such comparisons.

Not all of our observations have to be on standard tasks but those used to demonstrate change over time should be. The problem with observations is that they can have some sources of error not found in standardised tests. One of these sources of’error’ is that what the observer ‘knows’ about reading and writing will determine what that observer is likely to observe in children’s literacy development. You bring to the observation what you already believe. Observers must be aware of this and try to correct for this. All teachers using An Observation Survey of Early Literacy Achievement should be trained in how to administer, score and interpret their results in reliable ways.

When important decisions are to be made we should increase the range of observations we make in order to decrease the risk that we will make errors in our interpretations. We need to design procedures that limit the possibilities of being in error or being misled by what we observe. That is why, in the rest of this book, space is given to the precautions that observers must take if they wish to gather valid records. It is also why a wide range of measures or observations should be made. No one technique is sufficiently reliable on its own.

An unreliable test score means that if you took other measures, at around the same time or at another time, you might get very different results. We have to be concerned with whether our assessments are reliable because we do not want to alter our teaching, or decide on a child’s placement, on the basis of a flawed judgement. We need to be able to rely on the data from which we make our judgements. …

It is important that we use tasks that are ‘authentic’. The word authentic has arisen among educators because many tests of reading and writing and spelling are being challenged as not valid measures of real world literacy activities. One of the criticisms of the multiple choice type of test items is that they are a special type of task not found in real life; they are a test device with no real world reference. It will be better if we can find sound assessment procedures which reflect what the learner is mastering or struggling to master.


Clay, Marie M. 2000. An Observation Survey: Of Early Literacy Achievement. Portsmouth, NH: Heinemann. pp. 4, 12-3. || Amazon || WorldCat