TESTING MEMO 6: WHAT KIND OF GRADES SHOULD BE AVERAGED?
by
Lawrence H. Cross
Virginia Polytechnic Institute and State 
University

   Some instructors record letter grades for tests and 
assignments, and others record numerical values, often 
the percent correct on tests. Later, under either 
method, the grades are averaged, often employing a
weighting process designed to make some grades count 
more heavily than others.  Discussion of the merits of 
different approaches usually centers around the question 
of whether it is better to average letter or numerical
grades or around some feature of the weighting process.  
However, an important characteristic of the grades as 
initially recorded is seldom questioned, namely the 
variability of the scores of each test or assignment.  
Indeed, it is ironic that instructors may go to 
considerable trouble to weight grades according to their 
perceived importance when, in fact, the result may be 
quite different from what was intended, due to failure to 
account for differences in score variability from one 
test to another.
   To see how this outcome could occur, consider a course 
in which the midterm examination was much more difficult 
than the instructor intended; scores ranged from 35% to 95% 
with an average of 65%. Contrary to the advice of TESTING 
MEMO 2, the instructor did not view this outcome as desirable 
and, with the intention of being fair to the students, 
included a large proportion of easy questions on the final
examination.  Their presence caused a great reduction in 
score variation. Final examination scores ranged only from 
88% to 100% with and average of 94%.  Only a small number of 
harder questions kept everyone from earning very high scores 
in a narrow range.  The result was that differences from
one student to another in final course averages were 
largely attributable to scores on the midterm.  Thus, a 
student's achievement in the latter part of the course was
effectively devalued, which was hardly fair or in keeping with 
the presumed intention that grades reflect achievement across 
the entire course.
   The best approach to avoiding situations like the one 
just presented is to record and average standardized test 
scores.  In order to calculate standardized scores it is 
necessary to know the standard deviation of the scores prior 
to standardization.  This statistic is a measure of how
"spread out" the scores are and is explained in any 
elementary statistics text. Though nearly any program 
reporting test results will include this statistic, a 
fair approximation for most classroom tests may be obtained 
by subtracting the lowest score from the highest and dividing 
by 4. In the example above, the standard deviations are about 
15 and 3 percentage points respectively for the midterm and 
final examinations. A standard score is then the number of 
standard deviations the number-right or percentage score is 
above or below average. Commonly called a z-score, its formula 
is: z = (x - xbar)/s, in which x is the observed score, xbar 
is the average score, and s is the standard deviation.  Then 
scores of 80% and 97% on the midterm and final respectively 
would each yield z-scores of 1.0, because both are one 
standard deviation above average.  Similarly, scores of 50% and 
91% would correspond to z-scores of -1.0.   It may be difficult 
to work with z-scores, because half of them will be negative 
and all will probably lie between -3 and 3. Therefore, it 
is convenient to transform the z-scores into T-scores as 
follows: T = 50 + 10z. T-scores will have a mean (average) of 
50 and a standard deviation of 10. Thus, a T-score of 60 
represents a number-right score one standard deviation above 
the average.  If the distribution of scores approximates the 
shape of the normal curve, about 16% of the T-scores will be 
above 60 and about 10% above 63.  Similarly, about 16% of 
T-scores will be below 40 and about 10% below 37.
   If T-scores are computed for every test, averaging them
will provide a composite score from which the influence of the
variability of the scores has been eliminated.  (Strictly 
speaking, if more than two scores are to be averaged, 
the intercorrelations among the scores should be taken 
into consideration in order to control for the degree of 
"overlap."  However, simple averaging of T-scores should 
produce a good approximation of the more precise result.)
T-scores are typically provided for multiple-choice tests
processed by measurement services offices at universities.
Moreover, T-scores can be calculated for any numerically
evaluated non-test assignments you may wish to include 
in the course composite.  Like other scores, T-scores 
may be weighted differentially. For example, if you wish to 
weight the final exam twice as much as the midterm, multiply 
the T-scores from the final by 2, add the midterm T-scores 
and divide by 3.
   It should be noted at this point that T-scores report 
only a student's relative position in the class and not an 
absolute measure of achievement.  However, we contend that 
the difficulty level of nearly all academic tests is arbitrary 
and that, regardless of the scoring method, they provide 
nothing more than ranking information.  (See TESTING MEMO 2 for
a more complete discussion of this point.)  The concern of 
this MEMO is that the scores be averaged in a manner 
consistent with the instructor's intention.
   Finally, when the T-scores have been averaged, there is the
problem of assigning letter grades for the course.  Until 
this point, we have been able to speak with conviction, 
deducing conclusions logically through arguments based 
on statistical principles.  However, when it is necessary 
to determine the dividing  line between As and Bs or Ds and 
Fs, there is no such clear-cut approach available.  Of course, 
if student X's average is higher than student Y's, student 
Y's letter grade must not be higher than student X's, but 
beyond this recommendation our best advice is to inspect 
the distribution of average T-scores with the following
questions in mind:
    1. What is a typical letter grade distribution for a course 
of this type with this kind of student?
    2. Are there any circumstances which might warrant 
altering this "typical" distribution, e.g., did the course 
progress especially well or poorly?
    3. Where in the distribution are key students whose work 
you know especially well, students you believe might 
deserve especially good or poor grades for reasons other than 
test performance?
    4. Where are naturally occurring "breaks" in the 
distribution of average T-scores?  (There is no "scientific" 
reason for letting these points determine letter grades, but 
if their use is not inconsistent with other considerations, it 
will help to prevent hard feelings on the part of students 
who otherwise might miss a better grade by one T-score point.)
   Two ideas to be avoided or at least questioned in 
determining letter grades are:
    1. That the T-score spread should be the same for each 
letter grade.
    2. That an equal number of As and Fs, Bs and Ds, 
should necessarily be awarded.
   Finally, it must be remembered that assignment of letter 
grades a cross a range of average scores is essentially 
arbitrary and a matter of professional judgement.

For more information, contact Bob Frary at

Robert B. Frary, Director of Measurement 
and Research Services
Office of Measurement and Research Services
2096 Derring Hall
Virginia Polytechnic Institute and State 
University
Blacksburg, VA 24060
703/231-5413 (voice)
frary#064;vtvm1.cc.vt.edu
###