Applied Psychometrics 101 Brief #
14: Demographically adjusted neuropsychological
(Heaton) norm-based scores are inappropriate for the diagnosis of MR/ID
Kevin S. McGrew, PhD.
Director
Dale G. Watson, PhD.
Berkeley, CA
Neuropsychological
assessments are sometimes part of psychological evaluations in Atkins MR/ID death penalty cases. These assessments include specialized tests,
often in addition to an age-appropriate IQ battery, that are
specifically designed to assess brain-behavior relations. The neuropsychological-specific
tests (NPST) are used to draw inferences about brain function/dysfunction
and to provide functional implications of neuropsychological test data for a
person’s real-world functioning. NPST
batteries, as well as all the individual tests included in NPST batteries, are not designed or validated to provide
a reliable and valid estimate of a person’s general intelligence (of
course, an exception is the portion of the battery that may include an
individualized measure of general intelligence; e.g., WAIS-IV; WJ III; SB5).
Demographically Adjusted Test Norm
Interpretation is Inappropriate in the Diagnosis of Atkins MR/ID in Capital
Cases
A
test interpretation feature used in some neuropsychological assessments is demographically adjusted norms. The specialized NPST of memory,
sensory-motor function, concept formation, etc. may be reported with these
special demographically adjusted norms.
Also, demographically adjusted norms are sometimes applied to the
individualized measure of general
intelligence included as part of the NPST (see Lange et al., 2006).
The
most well known demographically adjusted norms are the Heaton norms.
As described by a
neuropsychologist in a recent Atkins
cases, Heaton norms are “number crunching, age-corrected, you know,
socioeconomic variable-corrected data,” and as generating “a comprehensive
T-score age-, education, sex-corrected, actually race-corrected, also.” In simple terms, the demographically-adjusted
norms make equation-based statistical adjustments that allow certain NPST
scores for an individual to be compared
against other individuals of the same age and other demographic characteristics
(e.g., gender, race, socio-economic status and level of education. In the context of neuropsychological assessment
to determine whether an individual’s functioning has decreased, such as after a
brain injury or a stroke, demographically adjusted norms may help with the
diagnosis of brain dysfunction and the identification of relative strength- and
weakness-generated interventions.
Siverberg and Millis (2009)
have outlined the clear distinction between using neuropsychological measures
to identify acquired deficits as opposed
to developmental deficiencies. They note:
If the clinician is interested in whether a patient has declined
from their premorbid status, contrasting their obtained raw scores with their
expected premorbid scores (based on age, education, gender, ethnicity, and any
other variables that add to their prediction) is most appropriate. This type of
comparison quantifies impairment—how much examinees’ scores are lowered
relative to their (estimated) preinjury/disease onset baseline. The degree of
impairment is likely most predictive of the patient’s success in returning to
(or continuing) work or other premorbidly engaged-in functional activities with
extraordinary or idiosyncratic cognitive demands. If, in contrast, the
clinician is interested in determining whether the patient’s cognitive
abilities are sufficient for the demands of universal functional tasks (e.g.,
activities of daily living, driving a car, operating a cashier, etc.),
comparing their raw [non-demographically adjusted] scores with general healthy
adult population norms, generating “absolute” scores, is most appropriate (p.
98).
When
used in the context of neuropsychological assessment, certain NPST scores are adjusted
so an individual’s performance is compared
not to the general population but only to others of the same age,
gender, race and level of education.
Norm-referenced testing is at
the heart of psychological assessment for the diagnosis of MR/ID (AAIDD, 2010). The diagnosis of MR/ID requires comparison of
a person’s scores against nationally
representative norms, not a comparison to others of
the same age, gender, race and level of education. An analogous situation would be for a
professional psychologist or lawyer whose intellectual functioning is in the
top 2% of the population as a whole, and who therefore obtains an IQ of 130
when his/her score is compared to nationally representative norms. If his/her score is instead compared only to
those of a group of his/her peers with a similar level of education, he/she may
fall only in the top 16% of that group and so his/her score would be much lower, perhaps 115.
Demographically
adjusted norm scores result in a sliding
reference point that no longer
represents a comparison to the general population, which is the only proper
reference point in the diagnosis of MR/ID. The use of demographically adjusted
norms is inappropriate if such adjusted scores are used to formulate and
inform an opinion regarding level of general intellectual functioning for the
diagnosis of MR/ID.
The
technical adequacy and appropriate use of demographically adjusted NPST scores
is not a settled professional consensus in the field of
neuropsychological assessment. A lack of
consensus in the field is represented by the significantly differing opinions
as articulated by Heaton et al. (1996), Lange,
Chelune, Taylor, Woodward and Heaton (2006), Russell (2005), Sattler (2001),
Sherrill-Pattison, Donders, and Thompson (2000), Strauss,
Sherman and Spreen (2006), Romero et al.
(2009), Yantz,
Gavett, Lynch and McCaffrey (2006).
Particularly
important is the professional consensus-based report produced by the 2008 Multicultural Problem Solving Summit (Byrd et al.,
2010)
of neuropsychologists that produced the document “Challenges in the Neuropsychological Assessment of Ethnic
Minorities: Summit Proceedings” (Romero et al., 2009). This consensus report stated (emphasis added via underline):
Demographic
adjustments to normative data are not validated for the use of
predicting future academic or employment performance, and laws exist to prohibit
the use of race-based norms in employment decisions (p. 767).
There was
consensus among participants that the field would benefit from guidelines for
neuropsychological practice among ethnic and racial minorities…The
guidelines should include a specific focus on appropriate and inappropriate
uses of demographic adjustments, as well as a discussion of the risks of
overpathologizing groups or denying appropriate services, and details of
limitations to the application of various normative standards (p. 767).
Additionally,
a number of authoritative assessment texts used frequently in the graduate
training of psychologists learning to conduct intellectual or
neuropsychological assessments have highlighted the potential problems with
demographically adjusted norm scores (emphasis added via underlining or bold
font) .
Pluralistic
norms[1]
are norms derived for individual groups, such as Euro Americans, African
Americans, Hispanic Americans, Asian Americans, and Native
Americans…Pluralistic norms are potentially dangerous, however, because
they (a) provide a basis for invidious comparisons among different ethnic
groups, (b) may lower expectations of culturally and linguistically
diverse children and reduce their level of aspiration to succeed, (c) may have
little relevance outside of the child’s specific geographic area, and (d)
furnish no information about the complex reasons why some ethnic groups tend to
score lower than others on intelligence tests (p. 661).
There
are two schools of thought regarding how closely matched the norms must be to
the demographic characteristics of the individual being assessed, and these views are diametrically opposed.
These are: (1) that norms should be
as
representative of the general population as possible, and (2) that norms should
approximate, as closely as possible, the unique subgroup to which the
individual belongs. (p. 47).
At
times, it will be paramount to compare the individual to all other persons of
the same age in the general population. Determining a diagnosis of mental retardation or learning
disability would be one example (p. 47).
Of
historical interest is the fact that neuropsychology’s recent shift toward
demographically corrected scores based on race/ethnicity and other variables
has occurred with surprisingly little fanfare or controversy despite
ongoing debate in other domains of psychology. For example, when race-norming was applied to
pre-employment screening in the United States to increase the number of
minorities being chosen as job applicants, the result was the Civil Rights Act
of 1991, which outlawed race-norming
for applicant selection of referral (see Sackett & Wilk, 1994; Gottfredson,
1994; and Greenlaw & Jensen, 1996, for an interesting historical review of
the ill-fated attempt at race-norming the GATB) (p. 50).
There
is also evidence that when “corrective norms” are applied, some demographic
influences remain, and overcorrection may occur, resulting in score
distortion for some subgroups and a risk of increased false negatives
(e.g., Fastenau, 1998) (p. 50).
Importantly,
with regard to the WAIS-III/WMS-III, the Psychological Corporation explicitly
states that demographically adjusted scores are not intended for use in psychoeducational assessment, determination of intellectual deficiency,
vocational assessment, or any other context where the goal is to determine
absolute functional level (IQ or memory) in comparison to the general
population. Rather, demographically adjusted scores are best used for
neurodiagnostic assessment in order to minimize the impact of confounding
variables on the diagnosis of cognitive impairment. That is, they should be
used to infer strengths and weakness relative to a presumed pre-morbid standard
(The Psychological Corporation, 2002). Therefore, neuropsychologists need to
balance the risks and benefits of using within-group norms, and use them
with a full understanding of their implications and the situations in which
they are most appropriate (p. 51).
The
following select quotes from professional neuropsychological publications also
make it clear that the neuropsychological professional jury is still “out”
regarding the methodology and appropriate application of demographically
adjusted NPTS scores (emphasis added
via underline or bold font).
Clinical
neuropsychologists are always starving for good normative data for established
neuropsychological measures. Unfortunately, too many studies (and even
manuals) contain too few subjects and/or their samples are not representative
of the target population on important demographic variables, especially age
and education. This was the problem that Heaton, Grant, and Matthews
attempted to address with Comprehensive Norms for an Expanded Halstead Reitan Battery
(1991). This practical product arose as a direct result of the
authors’ 1986 chapter in an edited book (Heaton, Grant, & Matthews, 1986).
The project represents a substantial effort on the part of the authors, and it
has many commendable qualities. However, the merits are accompanied by
significant shortcomings (p. 444).
Our own informal
inquiries have indicated that many of the scientist-practitioners in clinical
neuropsychology have embraced this book in an uncritical manner. The strong and
unreflective nature of such acceptance of these norms tells us how good an idea
this kind of normative project is, in
the abstract. Unfortunately, this particular product does not contribute as
much as our expectations might lead us to anticipate. The format and marketing
are so convincing that few would comb the introductory pages to analyze the
test selection quirks and statistical/ design problems that abound (p.447).
While we
agree that this initial attempt at providing demographic corrections for
several commonly used tests could have been more statistically sophisticated,
and possibly could have been more user friendly, the evidence seems to indicate
that the norms do have significant advantages for neuropsychological clinical
work and research (p. 457).
It
is generally understood that demographically corrected normative standards are
based upon performances of adults who have developed normally, have typical,
mainstream educational backgrounds, and have no known history of brain injury
or disease. It follows logically from this that such norms should be used with
great caution, if at all, to identify acquired brain dysfunction in patients
who have developmental disorders or other-than-mainstream educational
backgrounds (e.g., special education). For example, it would be inappropriate to “adjust” a mentally
retarded person’s IQ upward because of a low education level, thereby
potentially depriving him/her of social services or mitigating considerations
in criminal prosecution.
As
we have noted, demographically corrected norms are used primarily to identify
the presence and nature of neurobehavioral changes due to known or possible
brain insult (injury or disease). Such norms are generally not the best
choice for characterizing the individual’s absolute level of functioning, or
functioning in relation to the general population of normal adults (Heaton
& Pendleton, 1981) (p. 147).
Heaton,
Grant, and Matthews (1991) published procedures for adjusting raw scores on
various neuropsychological tests according to the individual's age and
education. Despite rather widespread use of these score conversions in both
clinical work and research publications, there have been very few
investigations to evaluate the accuracy or limitations of these score
transformations (p. 181).
Inasmuch
as the HGM method brings about its greatest corrections among persons whose
scores are more likely to be affected by brain impairment, a question must
be raised concerning exactly what the HGM transformations are accomplishing.
It might seem, at least in part, that the corrections are, in fact, correcting
for subtle impairment of brain functions in less-educated and older persons—the
very condition that neuropsychological tests were developed to detect (p.
188).
Concluding Comments
Demographically adjusted (e.g., Heaton norms) scores
are 100% inconsistent with
determination of a person’s general level of intellectual functioning as per
the first prong of the diagnosis of MR/ID.
Demographically adjusted IQ scores, in particular, should not play a
critical role when determining if a person has a deficit in general
intellectual functioning. Furthermore,
interpretation of demographically adjusted NPST test scores (e.g., Halstead
Category Test) to provide convergent validity evidence regarding a person’s
level of general intelligence is not appropriate in the context of MR/ID
diagnosis.
American
Association on Intellectual and Developmental Disabilities. (2010). Intellectual
disability: Definition, classification,
and systems of supports—11th Edition. Washington, DC: Author.
Fastenau, P. S. & Adams, K.
M. (1996). Book review: Heaton, Grant,
and Matthews' Comprehensive Norms: An Overzealous Attempt. Journal of Clinical and Experimental Neuropsychology, 18 (3),
444-448.
Heaton, R. K., Ryan, L., & Grant,
I. (2009). Demographic influences and
use of demographically corrected norms in neuropsychological assessment. In
Igor Grant and Kenneth M. Adams (Eds), Neuropsychological
Assessment of Neuropsychiatric and Neuromedical Disorders, Oxford
University Press US.
Lange, R. T., Chelune, G. J.,
Taylor, M. J., Woodward, T. S., & Heaton, R. K. (2006). Development
of demographic norms for four new WAIS-III/WMS-III indexes. Psychological Assessment, 18 (2), 174 181.
Romero, H. R., Lageman, S. K., Kamath, V., Irani, F., Sim, A., Suarez, P., Manly, J. J., Attix, D. K., & the Summit participants (2009). Challenges in the neuropsychological assessment of ethnic minorities: Summit Proceedings. The Clinical Neuropsychologist, 23, 761-779.
Russell, E. W. (2005). Norming subjects for the Halstead Reitan
battery. Archives of Clinical
Neuropsychology, 20, 479-484.
Sattler, J. (2001). Assessment of
Children: Cognitive Applications—4th
Edition. San Diego, CA: Jerome M. Sattler, Publisher, Inc.
Sherrill-Pattison, S., Donders, J., & Thompson, E.
(2000). Influence of demographic variables on neuropsychological test
performance after traumatic brain injury. The
Clinical Neuropsychologist, 14 (4), 496-503.
Silverberg, N., & Millis, S. (2009). Impairment versus deficiency in
neuropsychological assessment: Implications for ecological validity. Journal
of the International Neuropsychological Society (2009), 15, 94–102.
Strauss, E., Sherman, E. M. S., & Spreen, O. (2006). A Compendium of Neuropsychological Tests:
Administration, Norms, and Competency – 3rd Edition. New York,
NY: Oxford University Press.
Titus, J. B., Retzlaff, P. D., & Dean, R. S. (2002). Predicting
scores of the Halstead Category Test with the WAIS-III. International Journal of Neuroscience, 112, 1099-1114.
Yantz, C. L., Gavett, B. E., Lynch, J. K., & McCaffrey, R. J.
(2006). Potential for the interpretation disparities of Halstead-Reitan
neuropsychological battery performances in a litigating sample. Archives of Clinical Neuropsychology, 21, 809-817.
[1]
The term “pluralistic norms”
refers to the same concept as demographically adjusted norms and was the terminology
used in the 1970’s when procedures for adjusting IQ scores for minority
children based on race and socio-economic status were attempted. Although using a different approach, the theory
behind adjusting IQ scores is conceptually similar to that inherent in
demographically adjusted norms.
[2] It is important to note that
this quote is from the primary author of the Heaton norms. This statement
indicates that the author of the Heaton norms views them as useful in
“clinical” and “research” settings, which I interpret to not cover high stakes
forensic diagnostic purposes such as Atkins
cases.