Wednesday, July 11, 2012

AP 101 Brief #14: Demographically adjusted neuropsych (Heaton) norm-based scores inappropriate for MR/ID Dx

Applied Psychometrics 101 Brief # 14:  Demographically adjusted neuropsychological (Heaton) norm-based scores are inappropriate for the diagnosis of MR/ID

Kevin S. McGrew, PhD.

Dale G. Watson, PhD.
Berkeley, CA

Neuropsychological assessments are sometimes part of psychological evaluations in Atkins MR/ID death penalty cases.  These assessments include specialized tests, often in addition to an age-appropriate IQ battery, that are specifically designed to assess brain-behavior relations.  The neuropsychological-specific tests (NPST) are used to draw inferences about brain function/dysfunction and to provide functional implications of neuropsychological test data for a person’s real-world functioning.  NPST batteries, as well as all the individual tests included in NPST batteries, are not designed or validated to provide a reliable and valid estimate of a person’s general intelligence (of course, an exception is the portion of the battery that may include an individualized measure of general intelligence; e.g., WAIS-IV; WJ III; SB5).

Demographically Adjusted Test Norm Interpretation is Inappropriate in the Diagnosis of Atkins MR/ID in Capital Cases

A test interpretation feature used in some neuropsychological assessments is demographically adjusted norms.   The specialized NPST of memory, sensory-motor function, concept formation, etc. may be reported with these special demographically adjusted norms.  Also, demographically adjusted norms are sometimes applied to the individualized measure of general intelligence included as part of the NPST (see Lange et al., 2006).

The most well known demographically adjusted norms are the Heaton norms.   As described by a neuropsychologist in a recent Atkins cases, Heaton norms are “number crunching, age-corrected, you know, socioeconomic variable-corrected data,” and as generating “a comprehensive T-score age-, education, sex-corrected, actually race-corrected, also.”  In simple terms, the demographically-adjusted norms make equation-based statistical adjustments that allow certain NPST scores for an individual to be compared against other individuals of the same age and other demographic characteristics (e.g., gender, race, socio-economic status and level of education.   In the context of neuropsychological assessment to determine whether an individual’s functioning has decreased, such as after a brain injury or a stroke, demographically adjusted norms may help with the diagnosis of brain dysfunction and the identification of relative strength- and weakness-generated interventions. 

Siverberg and Millis (2009) have outlined the clear distinction between using neuropsychological measures to identify acquired deficits as opposed to developmental deficiencies. They note:

If the clinician is interested in whether a patient has declined from their premorbid status, contrasting their obtained raw scores with their expected premorbid scores (based on age, education, gender, ethnicity, and any other variables that add to their prediction) is most appropriate. This type of comparison quantifies impairment—how much examinees’ scores are lowered relative to their (estimated) preinjury/disease onset baseline. The degree of impairment is likely most predictive of the patient’s success in returning to (or continuing) work or other premorbidly engaged-in functional activities with extraordinary or idiosyncratic cognitive demands. If, in contrast, the clinician is interested in determining whether the patient’s cognitive abilities are sufficient for the demands of universal functional tasks (e.g., activities of daily living, driving a car, operating a cashier, etc.), comparing their raw [non-demographically adjusted] scores with general healthy adult population norms, generating “absolute” scores, is most appropriate (p. 98).

When used in the context of neuropsychological assessment, certain NPST scores are adjusted so an individual’s performance is compared not to the general population but only to others of the same age, gender, race and level of education.  Norm-referenced testing is at the heart of psychological assessment for the diagnosis of MR/ID (AAIDD, 2010).  The diagnosis of MR/ID requires comparison of a person’s scores against nationally representative norms, not a comparison to others of the same age, gender, race and level of education.  An analogous situation would be for a professional psychologist or lawyer whose intellectual functioning is in the top 2% of the population as a whole, and who therefore obtains an IQ of 130 when his/her score is compared to nationally representative norms.  If his/her score is instead compared only to those of a group of his/her peers with a similar level of education, he/she may fall only in the top 16% of that group and so his/her score would  be much lower, perhaps 115.

Demographically adjusted norm scores result in a sliding reference point that no longer represents a comparison to the general population, which is the only proper reference point in the diagnosis of MR/ID. The use of demographically adjusted norms is inappropriate if such adjusted scores are used to formulate and inform an opinion regarding level of general intellectual functioning for the diagnosis of MR/ID.

The technical adequacy and appropriate use of demographically adjusted NPST scores is not a settled professional consensus in the field of neuropsychological assessment.  A lack of consensus in the field is represented by the significantly differing opinions as articulated by Heaton et al. (1996), Lange, Chelune, Taylor, Woodward and Heaton (2006),  Russell (2005), Sattler (2001), Sherrill-Pattison, Donders, and Thompson (2000), Strauss, Sherman and Spreen (2006), Romero et al. (2009), Yantz, Gavett, Lynch and McCaffrey (2006). 

Particularly important is the professional consensus-based report produced by the 2008 Multicultural Problem Solving Summit (Byrd et al., 2010) of neuropsychologists that produced the document “Challenges in the Neuropsychological Assessment of Ethnic Minorities:  Summit Proceedings” (Romero et al., 2009).  This consensus report stated (emphasis added via underline):

Demographic adjustments to normative data are not validated for the use of predicting future academic or employment performance, and laws exist to prohibit the use of race-based norms in employment decisions (p. 767).

There was consensus among participants that the field would benefit from guidelines for neuropsychological practice among ethnic and racial minorities…The guidelines should include a specific focus on appropriate and inappropriate uses of demographic adjustments, as well as a discussion of the risks of overpathologizing groups or denying appropriate services, and details of limitations to the application of various normative standards (p. 767).

Additionally, a number of authoritative assessment texts used frequently in the graduate training of psychologists learning to conduct intellectual or neuropsychological assessments have highlighted the potential problems with demographically adjusted norm scores (emphasis added via underlining or bold font) .

Pluralistic norms[1] are norms derived for individual groups, such as Euro Americans, African Americans, Hispanic Americans, Asian Americans, and Native Americans…Pluralistic norms are potentially dangerous, however, because they (a) provide a basis for invidious comparisons among different ethnic groups, (b) may lower expectations of culturally and linguistically diverse children and reduce their level of aspiration to succeed, (c) may have little relevance outside of the child’s specific geographic area, and (d) furnish no information about the complex reasons why some ethnic groups tend to score lower than others on intelligence tests (p. 661).

There are two schools of thought regarding how closely matched the norms must be to the demographic characteristics of the individual being assessed, and these views are diametrically opposed. These are: (1) that norms should be as representative of the general population as possible, and (2) that norms should approximate, as closely as possible, the unique subgroup to which the individual belongs. (p. 47).

At times, it will be paramount to compare the individual to all other persons of the same age in the general population. Determining a diagnosis of mental retardation or learning disability would be one example (p. 47).

Of historical interest is the fact that neuropsychology’s recent shift toward demographically corrected scores based on race/ethnicity and other variables has occurred with surprisingly little fanfare or controversy despite ongoing debate in other domains of psychology. For example, when race-norming was applied to pre-employment screening in the United States to increase the number of minorities being chosen as job applicants, the result was the Civil Rights Act of 1991, which outlawed race-norming for applicant selection of referral (see Sackett & Wilk, 1994; Gottfredson, 1994; and Greenlaw & Jensen, 1996, for an interesting historical review of the ill-fated attempt at race-norming the GATB) (p. 50).

There is also evidence that when “corrective norms” are applied, some demographic influences remain, and overcorrection may occur, resulting in score distortion for some subgroups and a risk of increased false negatives (e.g., Fastenau, 1998) (p. 50).

Importantly, with regard to the WAIS-III/WMS-III, the Psychological Corporation explicitly states that demographically adjusted scores are not intended for use in psychoeducational assessment, determination of intellectual deficiency, vocational assessment, or any other context where the goal is to determine absolute functional level (IQ or memory) in comparison to the general population. Rather, demographically adjusted scores are best used for neurodiagnostic assessment in order to minimize the impact of confounding variables on the diagnosis of cognitive impairment. That is, they should be used to infer strengths and weakness relative to a presumed pre-morbid standard (The Psychological Corporation, 2002). Therefore, neuropsychologists need to balance the risks and benefits of using within-group norms, and use them with a full understanding of their implications and the situations in which they are most appropriate (p. 51).

The following select quotes from professional neuropsychological publications also make it clear that the neuropsychological professional jury is still “out” regarding the methodology and appropriate application of demographically adjusted NPTS scores (emphasis added via underline or bold font).

Clinical neuropsychologists are always starving for good normative data for established neuropsychological measures. Unfortunately, too many studies (and even manuals) contain too few subjects and/or their samples are not representative of the target population on important demographic variables, especially age and education. This was the problem that Heaton, Grant, and Matthews attempted to address with Comprehensive Norms for an Expanded Halstead Reitan Battery (1991). This practical product arose as a direct result of the authors’ 1986 chapter in an edited book (Heaton, Grant, & Matthews, 1986). The project represents a substantial effort on the part of the authors, and it has many commendable qualities. However, the merits are accompanied by significant shortcomings  (p. 444).

Our own informal inquiries have indicated that many of the scientist-practitioners in clinical neuropsychology have embraced this book in an uncritical manner. The strong and unreflective nature of such acceptance of these norms tells us how good an idea this kind of normative project is, in the abstract. Unfortunately, this particular product does not contribute as much as our expectations might lead us to anticipate. The format and marketing are so convincing that few would comb the introductory pages to analyze the test selection quirks and statistical/ design problems that abound (p.447).

While we agree that this initial attempt at providing demographic corrections for several commonly used tests could have been more statistically sophisticated, and possibly could have been more user friendly, the evidence seems to indicate that the norms do have significant advantages for neuropsychological clinical work and research (p. 457).

It is generally understood that demographically corrected normative standards are based upon performances of adults who have developed normally, have typical, mainstream educational backgrounds, and have no known history of brain injury or disease. It follows logically from this that such norms should be used with great caution, if at all, to identify acquired brain dysfunction in patients who have developmental disorders or other-than-mainstream educational backgrounds (e.g., special education). For example, it would be inappropriate to “adjust” a mentally retarded person’s IQ upward because of a low education level, thereby potentially depriving him/her of social services or mitigating considerations in criminal prosecution.

As we have noted, demographically corrected norms are used primarily to identify the presence and nature of neurobehavioral changes due to known or possible brain insult (injury or disease). Such norms are generally not the best choice for characterizing the individual’s absolute level of functioning, or functioning in relation to the general population of normal adults (Heaton & Pendleton, 1981) (p. 147).

Heaton, Grant, and Matthews (1991) published procedures for adjusting raw scores on various neuropsychological tests according to the individual's age and education. Despite rather widespread use of these score conversions in both clinical work and research publications, there have been very few investigations to evaluate the accuracy or limitations of these score transformations (p. 181).

Inasmuch as the HGM method brings about its greatest corrections among persons whose scores are more likely to be affected by brain impairment, a question must be raised concerning exactly what the HGM transformations are accomplishing. It might seem, at least in part, that the corrections are, in fact, correcting for subtle impairment of brain functions in less-educated and older persons—the very condition that neuropsychological tests were developed to detect (p. 188).

Concluding Comments

Demographically adjusted (e.g., Heaton norms) scores are 100% inconsistent with determination of a person’s general level of intellectual functioning as per the first prong of the diagnosis of MR/ID.  Demographically adjusted IQ scores, in particular, should not play a critical role when determining if a person has a deficit in general intellectual functioning.  Furthermore, interpretation of demographically adjusted NPST test scores (e.g., Halstead Category Test) to provide convergent validity evidence regarding a person’s level of general intelligence is not appropriate in the context of MR/ID diagnosis.

American Association on Intellectual and Developmental Disabilities.  (2010). Intellectual disability:  Definition, classification, and systems of supports—11th Edition. Washington, DC:  Author.

Fastenau, P. S. & Adams, K. M. (1996).  Book review: Heaton, Grant, and Matthews' Comprehensive Norms: An Overzealous Attempt. Journal of Clinical and Experimental Neuropsychology, 18 (3), 444-448.

Heaton, R. K., Ryan, L., & Grant, I. (2009).  Demographic influences and use of demographically corrected norms in neuropsychological assessment. In Igor Grant and Kenneth M. Adams (Eds), Neuropsychological Assessment of Neuropsychiatric and Neuromedical Disorders, Oxford University Press US.

Lange, R. T., Chelune, G. J., Taylor, M. J., Woodward, T. S., & Heaton, R. K. (2006).  Development of demographic norms for four new WAIS-III/WMS-III indexes. Psychological Assessment, 18 (2), 174 181.


Romero, H. R., Lageman, S. K., Kamath, V., Irani, F., Sim, A., Suarez, P., Manly, J. J., Attix, D. K., & the Summit participants (2009). Challenges in the neuropsychological assessment of ethnic minorities: Summit Proceedings. The Clinical Neuropsychologist, 23, 761-779.

Russell, E. W. (2005). Norming subjects for the Halstead Reitan battery. Archives of Clinical Neuropsychology, 20, 479-484.

Sattler, J. (2001). Assessment of Children:  Cognitive Applications—4th Edition.  San Diego, CA:  Jerome M. Sattler, Publisher, Inc.

Sherrill-Pattison, S., Donders, J., & Thompson, E. (2000). Influence of demographic variables on neuropsychological test performance after traumatic brain injury. The Clinical Neuropsychologist, 14 (4), 496-503.

Silverberg, N., & Millis, S. (2009).  Impairment versus deficiency in neuropsychological assessment: Implications for ecological validity.  Journal of the International Neuropsychological Society (2009), 15, 94–102.

Strauss, E., Sherman, E. M. S., & Spreen, O. (2006). A Compendium of Neuropsychological Tests: Administration, Norms, and Competency – 3rd Edition. New York, NY: Oxford University Press.

Titus, J. B., Retzlaff, P. D., & Dean, R. S. (2002). Predicting scores of the Halstead Category Test with the WAIS-III. International Journal of Neuroscience, 112, 1099-1114.

Yantz, C. L., Gavett, B. E., Lynch, J. K., & McCaffrey, R. J. (2006). Potential for the interpretation disparities of Halstead-Reitan neuropsychological battery performances in a litigating sample. Archives of Clinical Neuropsychology, 21, 809-817.

[1] The term “pluralistic norms” refers to the same concept as demographically adjusted norms and was the terminology used in the 1970’s when procedures for adjusting IQ scores for minority children based on race and socio-economic status were attempted.  Although using a different approach, the theory behind adjusting IQ scores is conceptually similar to that inherent in demographically adjusted norms.
[2] It is important to note that this quote is from the primary author of the Heaton norms. This statement indicates that the author of the Heaton norms views them as useful in “clinical” and “research” settings, which I interpret to not cover high stakes forensic diagnostic purposes such as Atkins cases.