ERIC®/AE Digest Series EDO-TM-98-05 August 1998
The Catholic University of America Department of Education
Various school districts use standardized tests as a way to measure scholastic achievement. Usually, these districts need to revise tests with some frequency to avoid administering the same test year after year. Unfortunately, creating new tests can be a very time consuming endeavor. Not only do test writers need to compose the test items, they also must determine each item's difficulty in order to ensure that a test will neither be too hard nor too easy.
Using item banks, test makers can escape this process. Item banks are files of various suitable test items that are "coded by subject area, instructional level, instructional objective measured, and various pertinent item characteristics (e.g., item difficulty and discriminating power)" (Gronlund, 1998, p. 130). The purpose of this digest is to discuss the advantages and disadvantages of using item banks as well as provide useful information to those who are considering implementing an item banking project in their school district.
Advantages of Item Banking
The primary advantage of item banking is in test development. Using a item response theory method, such as the Rasch model, items from multiple tests are placed on a common scale, one scale per subject matter. The scale indicates the relative difficulty of the items. Items can be placed on the scale, i.e. into the item bank, without extensive testing. New subtests and tests, with predictable characteristics, can be developed by drawing items from the bank. For example, suppose you are interested in developing a new subtest to cover fractions in seventh grade. You can go to the item bank, identify items related to your objectives and then predict the characteristics of a subtest composed of those items. The effect of including or excluding particular items can also be predicted.
Another advantage of an item bank is that it will permit you to "deposit" additional items to be withdrawn as needed. Depending on the size of the testing program, there can be two practical approaches for making deposits. You can make "large deposits" by merging you item bank with one from another district. You can also make "small deposits" by adding a few locally developed items at a time. The large deposit option will involve purchasing or trading items with another district and then equating their scale to yours. The small deposit option involves piloting a fewer number of items with examinees in several grade levels. This can easily be accomplished by adding a supplemental page containing experimental items to be administered along booklet from the school system.
Item banking provides substantial savings of time and energy over conventional test development. In traditional test development, items can only be described relative to the other items within the test and to whom they were given. That is item characteristics are extremely group and test specific. With item banking, items are described their relative difficulty across grade levels. In order to develop a new test or subtest, one does not need to go through the laborious process of developing a large set of items for piloting and evaluating. Instead, one just draws from the bank. Further, drawing from the bank allows one to make fairly accurate predictions concerning composite test characteristics.
One additional advantage of item banking is that it helps establish a language for discussing curriculum goals and objectives. The items describe individual tasks students are capable or incapable of doing. The location of the items on a calibrated scale, allows one to identify the relative difficulty of particular tasks. This provides a way to discuss possible learning hierarchies and ways to better structure curriculum.
Disadvantages and Limitations of Item Banking
Item banking and item response theory are not cure-alls for measurement problems. Persistence and good judgement must remain vital aspects in any test construction and test usage effort. One must make every possible effort to include only quality items in the item bank. The same care and effort must go into item writing. Items purchased form external sources must be evaluated carefully for match to your curriculum as well as for technical quality.
Item banking involves equating various tests and items. It is entirely possible, mathematically, to equate tests which cover entirely different subject matter. At the practical level, this means that it is also possible to equate items which assess subtly, but significantly different skills. In order to avoid this undesirable situation, the item review process must also include a careful evaluation of the skills assessed by each item and tests must be carefully formulated.
The intent of compiling a test using latent trait theory is to be able to make a prediction of the composite test characteristics. While the prediction is often surprisingly accurate, it must be validated. Tests developed using latent trait theory should still be field tested.
While some districts have implemented very successful item banks and Rasch calibrated testing programs without knowing anything about IRT, good practice calls for a staff that is comfortable with and knowledgeable of what they are doing. A district undertaking an item banking project should have full understanding of the practical as well as the mathematical/theoretical aspects of item banking.
An item bank really consists of multiple collections of items with fairly unidimensional content area, such as mathematic computations or vocabulary. Collections of items usually span several grade levels. In order to develop the bank, many tests must be calibrated, linked (or equated), and organized. This requires a great deal of work in terms of preparation and planning and in terms of computer time and expertise. Once the item bank is established, however, test development time, effort, and cost is reduced.
Planning for an Item Bank
The most crucial step in developing an item bank is planning. This involves the preparation of individuals, the identification of what you have to start an item bank, and the identification of what you hope to accomplish with an item bank.
Everyone on the staff should have enough familiarity with Rasch measurement principles and item banking to be able to knowledgeably discuss and explain the project. You can formally train your staff by using in-house personnel, bringing in a traveling workshop, or having people attend a pre-session at a research association or conference.
You should have senior level personnel available to answer technical questions that might arise. You should also have computer expertise that are capable of doing the following tasks: 1.) modifying computer programs, 2.) establishing a data base system, and 3.) capable of running packaged programs.
If you intend to do any item bank exchanges or purchases, you should have someone on your staff who knows what is available. You need personnel capable of critically evaluating test items for technical quality, curriculum match, unidimensionality, and potential bias. In order to accurately calibrate test items and establish scales, items need to be presented to examinees with a wide range of ability.
In order to link various forms and grade levels within a content area, common anchor items are needed. (These anchor items must be administered along with the items within a given form. The form and anchor items are calibrated together. The anchor item parameter values based on calibration with one form are compared with the anchor item parameter values based on calibration with another form. The difference in parameter values is used to link the forms.) You need to identify for which content areas you have administered overlapping subtests and the number of students responding to the set of items. You may find you will need to gather additional item response data to link forms and grade levels.
Your data processing staff should examine literature and programs on item banking to determine what programs must be developed and what programs can be modified.
As much as possible, you should identify your projected testing needs for the next five years. This would involve identification of which subtests you will need to revise, what additional areas you may need to assess, and how objectives might be differently stressed.
The start-up activities would mostly involve administrative activities and the data processing staff. Each test would have to be calibrated and equated to the parallel form and adjacent grade levels. The data processing staff would have to adapt existing computer programs to the local system and develop a database system. They would then calibrate each test, equate the tests, and store the equated item parameters and their descriptors in a database system. With a large number of tests and items, this becomes a major undertaking.
Administrative staff would have to coordinate activities to insure that the data requirements are met. During the planning process, a chart can be developed to identify which tests and anchor items have been and will need to be administered to the requisite sample. Working from these charts, testing coordinators will need to organize the administration of tests and subtests needed to calibrate and equate all the items going into the item bank. This involves compiling test booklets, making testing arrangements, collecting response sheets, and preparing data for data processing. Depending on frequency of students taking multiple subtests from different levels and forms, this too can be a major undertaking.
Running the Item Bank
The item bank will allow you to withdraw items as needed to develop new or even special tests and subtests. There are basically two activities involved in running an item bank - making deposits and withdrawing items to develop a test.
As mentioned earlier, there are to viable options for making deposits to the item bank. The "large deposit" option involves merging an existing item bank with your own. If the existing item bank has been IRT calibrated, then you only need to administer a subset of items (per content area) from the new bank along with items already in your item bank. Remember, each item bank uses its own anchor items and allows you to equate the scales. This part will involve testing with a relatively small group of students. The anchor items from the new item bank can be appended to present group. Coordination would be similar to that involved in starting your own item bank.
The major task involved in using items from another item bank is a thorough, careful review of the items. All potential entries must be evaluated for technical quality, curriculum match, and potential bias. This would involve your test development experts, curriculum/instructional staff, and coordination between the two.
After an item review, items from non-calibrated could be treated like items developed by your staff. "Small deposits" would be made by calibrating and equating a few items at a time. One very efficient approach to collecting the requisite data is to append subtests of new items to original groups. The items within the original group would serve as anchor items for the new subtest(s) of items. In this manner, you can be constantly adding to your item bank.
Once developed and growing, your item bank is ready to provide the advantages discussed above. To develop a new subtest, you would develop a blueprint/table of specifications to outline what you want your new subtest to be like. Curriculum specialists and test development experts would then go to the item bank and identify which item in the bank appear appropriate in terms of content and in terms of their relative difficulty. If they find an insufficient number of items, them can make arrangements to add new items to the bank.
If the bank contains a sufficient number of items of the appropriate nature, the items can be grouped to form a new subtest. Without pilot testing, the characteristics of this new subtest can be predicted. With reasonable accuracy, you will know how much skill an examinee needs to obtain any given total raw score on the new subtest. The prediction should be validated by administering the subtest to students having received appropriate instruction and students not having received such instruction. This can also be accomplished by appending items to the existing forms. This validation would need a sample as large as you used in field testing the original group.
An item bank provides a scale of relative difficulty of tasks that covers multiple grade levels and skills within content areas. As a service to the instructional/curriculum staff, you can provide information on the relative difficulty of different taks within and across grades levels. For example, you can identify which fraction problems seventh graders find as difficult as certain decimal problems; or you can identify which reading skills taught in fourth grade can be mastered by students in their grade. It could also be used to help organize special programs for gifted and remedial students.
Grolund, N.E. (1998). Assessment of Student Achievement. Sixth Edition. Needham Heights, MA: Allyn and Bacon.
Lord, F.M. (1980). Applications of item response theory to practical testing problems. Hillsdale, N.J. : L. Erlbaum Associates.
Mengel, Bill E.; Schorr, Larry L. (1992)Developing Item Bank Based Achievement Tests and Curriculum-Based Measures: Lessons Learned Enroute. (ERIC Document Reproduction Number ED344915).
Ward, A.W.; Murray-Ward, M. (1994). Guidelines for the development of item banks. An NCME instructional module.
Educational Measurement: Issues and Practice,13(1), 34-39.
Wright, B.D.; Stone, M.H. (1979). Best Test Design. Rasch Measurement. Chicago, IL: MESA Press.
The Catholic University of America, Washington, DC 20064 * 800 464-3742
©1999-2012 Clearinghouse on Assessment and Evaluation. All rights reserved. Your privacy is guaranteed at