>
Volume: | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |
Permission is granted to distribute this article for nonprofit, educational purposes if it is copied in its entirety and the journal is credited. Please notify the editor if an article is to be used in a newsletter. |
Ronald Hambleton and Jane H.Rodgers When important decisions are made based on test scores, it is critical to
avoid bias, which may unfairly influence examinees' scores. Bias is the presence
of some characteristic of an item that results in differential performance for
individuals of the same ability but from different ethnic, sex, cultural, or
religious groups. This article introduces three issues to consider when evaluating items for
bias -- fairness, bias, and sterotyping. The issues are presented and sample
review questions are posed. A comprehensive item bias review form based on these principles is listed in the references and is available from ERIC/AE. This Article
and the review form are intended to help both item writers and reviewers. In any bias investigation, the first step is to identify the subgroups of
interest. Bias reviews and studies generally focus on differential performance
for sex, ethnic, cultural, and religious groups. In the discussion below, the
term designated subgroups of interest (DSI) is used to avoid repeating a list of
possible subgroups. Fairness vs. Bias In preparing an item bias review form, each question can be evaluated from
two perspectives: Is the item fair? Is the item biased? While the difference may
seem trivial, some researchers contend that judges cannot detect bias in an
item, but can assess an item's fairness. Perhaps the best approach is to include
both types of questions on the review form. (Box 1 offers a list of questions
addressing fairness.) Different Kinds of Bias Bias comes in many forms. It can be sex, cultural, ethnic, religious, or
class bias. An item may be biased if it contains content or language that is
differentially familiar to subgroups of examinees, or if the item structure or
format is differentially difficult for subgroups of examinees. An example of
content bias against girls would be one in which students are asked to compare
the weights of several objects, including a football. Since girls are less
likely to have handled a football, they might find the item more difficult than
boys, even though they have mastered the concept measured by the item (Scheuneman,
1982a). An item may be language biased if it uses terms that are not commonly used
statewide or if it uses terms that have different connotations in different
parts of the state. An example of language bias against blacks is found in an
item in which students were asked to identify an object that began with the same
sound as "hand." While the correct answer was "heart," black
students more often chose "car" because, in black slang, a car is
referred to as a "hog." The black students had mastered the concept
but were selecting the wrong item because of language differences (Scheuneman,
1982b). Questions that might be asked to detect content, language, and item structure and
format bias are listed in Box 2. Content Bias Language Bias Item Structure and Format Bias Stereotyping and Inadequate Representation of Minorities Stereotyping and inadequate or unfavorable representation of DSI are
undesirable properties of tests to which judges should be sensitized. Tests
should be free of material that may be offensive, demeaning, or emotionally
charged. While the presence of such material may not make the item more
difficult for the candidate, it may cause him or her to become "turned
off," and result in lowered performance. An example of emotionally charged
material would be an item dealing with the high suicide rate among Native
Americans. An example of offensive material would be an item that implied the
inferiority of a certain group, which would be offensive to that group. Terms
that are generally unacceptable in test items include lower class, housewife,
Chinaman, colored people, and red man. Additional terms to avoid include job designations that end in
"man." For example, use police officer instead of policeman;
firefighter instead of fireman. Other recommendations to eliminate stereotyping: Recommended Reading This Article is based on Hambleton, R.K. and Rogers,H.J. (1996) Developing
an Item Bias Review Form, which is available through ERIC/AE. Berk, R.A. (Ed.). (1982). Handbook of methods
for detecting test bias. Baltimore,MD: The Johns
Hopkins University Press. Chipman, S.F. (1988, April). Word problems:
Where test bias creeps in. Paper presented at
the meeting of AERA, New Orleans. Hambleton, R.K., & Jones, R.W. (in press).
Comparisons of empirical and judgemental methods for
detecting differential item functioning. Educational
Research Quarterly. Lawrence, I.M., Curley, W.E., & McHale, F.J.
(1988, April). Differential item functioning of
SAT-verbal reading subscore items for male and
female examinees. Paper presented at the meeting
of AERA, New Orleans. Mellenbergh, G.J. (1984, December). Finding
the biasing trait(s). Paper presented at the
Advanced Study Institute Human Assessment: Advances
in Measuring Cognition and Motivation, Athens,
Greece. Mellenbergh, G.J. 1985, April). Item bias:
Dutch research on its definition, detection, and
explanation. Paper presented at the meeting of
AERA, Chicago. Scheuneman, J.D. (1982a). A new look at bias in
aptitude tests. In P. Merrifield (Ed.), New
directions for testing and measurement: Measuring
human abilities, No. 12. San Francisco: Jossey-Bass. Scheuneman, J.D. (1982b). A posteriori analyses
of biased items. In R. A. Berk (Ed.), Handbook of
methods for detecting test bias. Baltimore, MD:
The Johns Hopkins University Press. Scheuneman, J.D. (1984). A theoretical framework
for the exploration of causes and effects of bias in
testing. Educational Psychology, 19(4),
219-225. Schmitt, A.P., Curley, W.E., Blaustein, C.A.,
& Dorans, N.J. (1988, April). Experimental
evaluation of language and interest factors related
to differential item functioning for Hispanic
examinees on the SAT-verbal. Paper presented at
the meeting of AERA, New Orleans. Tittle, C.K. (1982). Use of judgmental methods in item bias studies. In R.A.
Berk (Ed.), Handbook of methods for detecting item bias. Baltimore, MD:
The Johns Hopkins University Press.
Sitemap 1 - Sitemap 2 - Sitemap 3 - Sitemap 4 - Sitemape 5 - Sitemap 6
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Descriptors: Cultural Differences; *Culture Fair Tests; Ethnicity; *Evaluation Methods; *Item Bias; Religious Cultural Groups; Sex Differences; *Stereotypes; Test Construction; Test Format; *Test Items |
Sitemap 1 - Sitemap 2 - Sitemap 3 - Sitemap 4 - Sitemape 5 - Sitemap 6