|From the ERIC database
Sound Performance Assessments in the Guidance Context. ERIC Digest.
Not since the development of the objective paper and pencil test early in the century has an assessment method hit the American educational scene with such force as has performance assessment methodology in the 1990s. Performance assessment relies on teacher observation and professional judgment to draw inferences about student achievement. The reasons for the intense interest in an assessment methodology can be summarized as follows:
During the 1980s important new curriculum research and development efforts at school district, state, national and university levels began to provide new insights into the complexity of some of our most valued achievement targets. We came to understand the multidimensionality of what it means to be a proficient reader, writer, and math or science problem solver, for example. With these and other enhanced visions of the complex nature of the meaning of academic success came a sense of the insufficiency of the traditional multiple choice test. Educators began to embrace the reality that some targets, like complex reasoning, skill demonstration and product development, "require"--don't merely permit--the use of subjective, judgmental means of assessment. One simply cannot assess the ability to write well, communicate effectively in a second language, work cooperatively on a team, and complete science laboratory work in a quality manner using the traditional selected response modes of assessment.
As a result, we have witnessed a virtual stampede of teachers, administrators and educational policy makers to embrace performance assessment. In short, educators have become as obsessed with performance assessment in the 1990s as we were with the multiple choice tests for 60 years. Warnings from the assessment community (Dunbar, Kortez, and Hoover, 1991) about the potential dangers of invalidity and unreliability of carelessly developed subjective assessments not only have often gone unheeded, but by and large they have gone unheard.
Now that we are a decade into the performance assessment movement, however, some of those quality control lessons have begun to take hold. Assessment specialists have begun to articulate in terms that practitioners can understand the rules of evidence for the development and use of high quality performance assessments (e.g. Messick, 1994). As a result, we are well into a national program of research and development that builds upon an ever clearer vision of the critical elements of sound assessments to produce ever better assessments (Wiggins, 1993).
The purpose of this digest is to provide a summary of those attributes of sound assessments and the rules of evidence for using them well. The various ways the reader might take advantage of this information also are detailed.
THE BASIC METHODOLOGY
For example, in defining the performance to be evaluated, assessment developers must decide where or how evidence of academic proficiency will manifest itself. Is the examinee to demonstrate the ability to reason effectively, carry out other skills proficiently or create a tangible product? Next, the developer must analyze skills or products to identify performance criteria upon which to judge achievement. This requires the identification of the critical elements of performance that come together to make it sound or effective. In addition, performance assessors must define each criterion and articulate the range of achievement that any particular examinee's work might reflect, from outstanding to very poor performance. And finally, users can contribute immensely to student academic development by finding examples of student achievement that illustrate those different levels of proficiency.
Once performance is defined, strategies must be devised for sampling student work so skills or products can be observed and evaluated. Examinees might be presented with structured exercises to which they must respond. Or the examiner might unobtrusively or opportunistically watch performers during naturally occurring classroom work in order to derive evidence of proficiency. When structured exercises are used to elicit performance, they must spell out a clear and complete set of performance responsibilities for examinees. In addition, the examiner must include in the assessment enough exercises to sample the array of performance possibilities in a representative manner that is large enough to lead to confident generalizations about examinee proficiency.
And finally, once the desired performance is described and exercises have been devised, procedures must be spelled out for making and recording judgments. These scoring schemes, sometimes called rubrics, help the evaluator translate judgments of proficiency into ratings. The assessment developer must select the level of detail to be reflected in records, the method of recording results, and who will be the observer and rater of performance.
SOUND PERFORMANCE CRITERIA
The attributes of quality performance that form the basis of judgment criteria should be couched in the best current thinking about the keys to academic success as defined in the professional literature of the discipline in question.
SOUND PERFORMANCE EXERCISES
EFFECTIVE SCORING AND RECORDING
One test of the quality of ratings is interrater agreement. A high level of degree of agreement is indicative of objectivity of ratings. Another test of quality is consistency in a particular rater's judgments over time. Ratings should not drift but rather should remain anchored to carefully defined points on the scoring scale. A third index of performance rating quality is consistency in ratings across exercises intended to be reflective of the same performance--an index of internal consistency. When these standards are met, it becomes possible to take advantage of the immense power of this kind of assessment to muster concrete evidence of improvement in student performance over time.
There are three design decisions to be made by the performance assessment developer with respect to scoring schemes: the level of specificity of scoring, the selection of the record keeping method, and the identification of the rater. Scores can be holistic or analytical, considering criteria together as a whole or separately. The choice is a function of the assessment purpose. Purposes like diagnosing weaknesses in student performance that require a high resolution microscope require analytical scoring.
Recording system alternatives include checklists of attributes present or absent in performance, rating scales reflecting a range in performance quality, anecdotal records that describe performance or mental record keeping. Each offers advantages and disadvantages depending on the specific assessment context.
Raters of performance can include the teacher, another expert, students as evaluators of each other's performance or students as evaluators of their own performance. Again, the rater of choice is a function of context. However, it has become clear that performance assessment represents a powerful teaching tool when students play roles in devising criteria, learning to apply those criteria, devising exercises, and using assessment results to plan for the improvement of their own performance--all under the leadership of their teacher.
PERFORMANCE ASSESSMENT IN THE GUIDANCE CONTEXT
Or counselors might be invited to serve as raters of student performance in specific academic disciplines. If and when such opportunities arise, thorough training is essential for all who are to serve in this capacity. If the teachers issuing this invitation have developed or gleaned from their professional literature refined visions of the meaning of academic success, have transformed them into quality criteria and provide quality training for all who are to observe and evaluate student performance, this can be a very rewarding professional experience. If these standards are not met, it is wise to urge (and perhaps help with) a redevelopment of the assessment. The third and final contact for counselors is as an evaluator of students within the context of the guidance function, observing and judging academic or affective student characteristics. In this case, the counselor will be both the developer and user of the assessment and must know how to adhere to the above mentioned standards of assessment quality.
For all of these reasons, it is advisable for school guidance and counseling personnel to understand when this methodology is likely to be useful and when it is not and how to design and develop sound performance assessments.
Dunbar, S.B., Kortez, D.M., & Hoover, H.D. (1991). Quality control in the development and use of performance assessments. Applied Measurement in Education, 4(4), 289-304.
Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educational Researcher, 23(2), 13-23.
Quellmalz, E.S. (1991). Developing criteria for performance assessments: The missing link. Applied Measurement in Education, 4(4), 319-332.
Stiggins, R.J. (1994). Student-centered classroom assessment. Columbus, OH: Macmillan.
Wiggins, G.P. (1993). Assessing student performance. San Francisco, CA: Jossey- Bass.
Rick Stiggins is Director of the Assessment Training Institute in Portland, Oregon.
ERIC Digests are in the public domain and may be freely reproduced and disseminated. This publication was funded by the U.S. Department of Education, Office of Educational Research and Improvement, Contract No. RR93002004. Opinions expressed in this report do not necessarily reflect the positions of the U.S. Department of Education, OERI, or ERIC/CASS.
Title: Sound Performance Assessments in the Guidance Context. ERIC Digest.
Descriptors: * Academic Achievement; * Educational Assessment; Elementary Secondary Education; * Evaluation Criteria; * Evaluation Methods; Evaluation Problems; Evaluation Research; Evaluation Utilization; Nongraded Student Evaluation; Performance Tests; School Counselors; School Guidance; * Student Evaluation; Teacher Expectations of Students; * Testing
Identifiers: ERIC Digests
©1999-2012 Clearinghouse on Assessment and Evaluation. All rights reserved. Your privacy is guaranteed at