Applied Psychometrics (AP) 101 Brief #1b:  g or not to g in Atkins MR death penalty cases (second in a series)

If you have not read the first  post in this series, you should read the first post now.  Then return and resume reading.

As described in the first post, the g-loadings (of the tests or composite scores in an IQ battery) on the first principal-component in principal component analysis (PCA) is a traditional index of the g-ness (aka., saturation of general intellectual ability) of a measure.  Furthermore, g-loadings calculated within a specific intelligence battery only tell you the relative g-ness of measures as defined by that specific collection of measures within that particular IQ battery.  It is when one moves to joint-analysis of intelligence batteries (and the more batteries in the analysis the better) that a more accurate picture of a measures g-ness can be determined.

Having access to the mixed LD/normal university young adult sample reported in the WJ III technical manual (conflict of interest note - I'm a coauthor of the WJ III), I just ran a joint PCA on on this sample of 200.  A description of the sample and instruments administered, abstracted from the WJ III technical manual (click here for a brief WJ III technical manual bulletin summary) can be found by clicking here.   I selected this data set as the adult subjects had all been administered the WAIS-III.  In addition, they had also been administered the CHC-based WJ III Tests of Cognitive Ability (WJ III) and the Gf-Gc based Kaufman Adolescent and Adult Intelligence test (KAIT).  I analyzed the composite scores from the three batteries via the PCA procedures conceptually described in the first post in this series.  Below is a summary of the results.

Interestingly, the top four g-measures are from the KAIT (Fluid and Crystallized composites) and the WJ III (Gc=Comprehension-Knowledge; Gf=Fluid Reasoning).  Even more interesting was the finding that when a more broad array of cognitive ability composites are included in a g-analysis, the WAIS-III VC (Verbal Comprehension Index; also classified as a strong measure of Gc as per CHC theory) is only the 9th strongest g-measure, and even falls behind the WAIS-III Perceptual Organization (PO; primarily Gv and some Gf as per CHC theory) and Working Memory Indexes (WM--Gsm as per CHC theory). 

A hypothesis that has been advanced to explain the differences in how Gc/verbal abilities are measured by the Wechsler verbal scales and Gc/verbal abilities on other cognitive batteries is grounded in Jim Cummins distinction between two types of language proficiency---BICS (Basic Interpersonal Communication Skills) and CALPS (Cognitive Academic Language Proficiency; click here for additional on-line information).  Briefly, BICS is language proficiency in more contextualized everyday language contexts while CALP is the more context-reduced conceptual-linguistic knowledge that occures in a context of semantics, and abstractions.  CALP is more cognitively demanding.  Anyone familiar with the Wechsler verbal subtests of Vocabulary, Comprehension and Similarities knows that they allow for subjects to provide lengthy verbal responses in their own everyday language.  In contrast, verbal items on other IQ tests (e.g., WJ III), require one-word responses and tend to focus more on cognitive processing involving language (e.g., antonyms, synonyms, verbal analogies).  It has been hypothesized that the Wechsler verbal tests and scales are more BICS-influenced while other IQ tests tend to use verbal/Gc test formats that require more CALP.  This might explain the findings reported above--that the WAIS III Verbal Comprehension Index is less cognitively demanding that the Gc/Verbal scales from the WJ III and KAIT.

Why is this finding important?  Because in a number of Atkins decisions, where concerns were raised about the person's WAIS-III Full Scale score being a good estimate of the persons g-ness (general intelligence), considerable stock was placed in the Verbal IQ (of which the Verbal Comprehension Index is now a purer factor measure) as being the best estimate of the person's general intelligence (see posts re: Maldonado and Vidal decisions)

Lets also examine these same data through the lens of multidimensional scaling analysis (specifically, Guttman's Radex Model).  The radex model statistically classifies measures as per two dimensions--cognitive complexity and stimulus content.  As noted in the prior post in this series, cognitively complexity is considered an index of general intelligence (g). For interested readers, a classic article on the use of the MDS radex model in the analysis of intelligence measure was published in Intelligence in 1983 (Marshalek, Lohman & Snow, 1983). Below is a visual-spatial representation of the MDS results in the current university sample, using the same composite measures as reported above (in the PCA g-loading analysis).  The interpretations below are mine.

The two broad cognitive processing continuim interpretaions (X- and Y-axis) are not the focal point of the current discussion.  The most critical finding in the current context, as per the radex model, is how close to the center of the figure a composite measure score is placed.  Measures that are the closest to the center are considered the most cognitively complex.  As measures move further away from the center, they are judged to be less cognitively complex

The results, although using a different method than PCA, produce the same conclusions.  The most cognitively complex measure in this sample (which could be thus interpreted as the best index of cognitive complexity or g-ness) is the KAIT Fluid Intelligence Scale.  The next closest to the center of the figure are the WJ III Gc (Comprehension-Knowledge) and Gf (Fluid Reasoning) clusters.  Again of interest is the location of the WAIS VC is much less cognitively complex than these other measures, and interestingly, much less cognitively complex than the two similar measures of Gc abilities (WJ III Gc composite; KAIT Crystallized Intelligence composite).  I've also provided my stimulus content hypothesis interpretations of groupings of composties (designated by ovals) from across the batteries (e.g., Processing Speed- WAIS Processing Speed and WJ III Gs or Processing Speed).

The findings in this one sample (which therefore warrants caution in generalization), suggests that the WAIS-III Verbal Comprehension composite, which is the most valid measure of Gc or verbal abilities on the WAIS-III, may NOT be all it is thought to be--when it comes to tapping cognitively complex cognitive processing.  Other Gc or verbal measures from intelligence batteries with adult norms (KAIT; WJ III) were found, in a relative sense, to be much better indicators of a person's g-ness (general intelligence).  The data suggest that, in this sample, even the WAIS-III Working Memory Index score may be a better relative proxy for g-ness than the WAIS verbal composite.

These analyses raise interesting questions about Atkins decisions that have relied either exclusively on the WAIS-R/WAIS-III scores, particularly when part scores (Verbal IQ, Verbal Comprehension; Peformance IQ; Perceptual Organization; etc.) are used instead of the Full-Scale IQ to determine mental retardation, or when the respective WAIS verbal composite is considered better than other potential test global IQ scores (or similar Gc/verbalcomposite scores from other batteries) in making a determination of level of general intellectual functioning.

How can this be?  How can a major scale from the "gold standard" of IQ tests (as it is commonly called in Atkins decisions) be a poorer estimate of general intelligence (g-ness) than most psychologists think?  More importantly, what are the implications for Atkins decisions, when the WAIS-R/III Full Scale score has been questioned as an accurate g-estimate in the face of considerable profile variability, and then the Verbal IQ/Verbal Comprehension Index is used to estimate g-ness (general intelligence)?

In both the Maldonado and Vidal decisions considerable stock was placed on the respective Wechsler verbal composite scores as being the best indicator of general intelligence (for making a determination of mental retardation).  In Maldonado, the reliance of the Wechsler verbal composite trumped a more comprehensive CHC-based IQ battery (BAT-R) administered in Maldonado's native language (Spanish).  Of concern in the Vidal decision, is that he had been administered four different versions of the Wechsler IQ batteries (over many decades), and they consistently revealed a large verbal/nonverbal (performance) IQ split.  Thus, arguments hinged extensively on the Verbal IQ vs the Full Scale IQ.  I'm perplexed why the experts in intelligence and intelligence testing, even without knowing the result of the above g-analysis, did not say "we've got consistent Wechsler V-P split information, I think it would be important to administer more contemporary intelligence tests, or parts of some of these batteries, to find out more information about important g-related cognitive abilities (for the defendant) not measured by the WAIS battery."  The Wechsler batteries had consistently captured Vidals abilities as measured per that battery--wouldn't time have been better spent, and a decision made on a higher quality array of cognitive information, by requesting administration of other IQ tests (or parts of other IQ tests) instead of arguing over old and consistent limited cognitive data? In fact, I, together with Flanagan and Ortiz, published a book in 2000 (The Wechsler Intelligence Scales and Gf-Gc theory:  A contemporary approach to interpretation) that presents procedures for augmenting the various Wechsler batteries to provide for a more comprehensive CHC/Gf-Gc based assessment of a persons intellectual functioning.  This information was also available as early as 1998 (see ITDR by McGrew and Flanagan).  [conflict of interest note - I coauthored these two books which made little in the way of ching-$ for the authors.  They are now both not being printed and none of the authors are receiving any royalties from their sales].

I continue to be baffled/troubled by the over-reliance, and almost god-like stature of the various versions of the WAIS (R/III/IV), in Atkins rulings.  It has been well known (and written about in articles and books; click here) since the early 1990's, that contemporary CHC (aka, Gf-Gc) theory had emerged as the consensus model of intelligence and, more importantly, instruments had been designed (with adult norms) to measure many of the unmeasured or poorly measured CHC abilities not taped by the WAIS-R/III batteries.  If I was an attorney arguing an Atkins case, on either side of the fence, I would seek intellectual testing beyond the so-called "gold standard."  I would want the best possible estimate of g-ness (since this seems to be the crux of the first prong of MR determination in most Atkins cases).

IMHO, the major problem is that of the "inertia of tradition" in intelligence testing, particularly in psychology disciplines that deal with adult populations.  Many practicing psychologists, esp. those working in adult settings whose professional associations and journals have paid less attention to contemporary intelligence theory and test development (less than school and educational psychologists), simply have not kept abreast of these developments. 

How long will Atkins expert intelligence testimony, expert debates, and decisions be made in the face of the Atkins MR IQ Theory-Test gap?  Isn't this simply wrong?  Professionally and ethically shouldn't psychologists who offer judgements in life-or-death decisions hinging on IQ test results be "up to speed" regarding contemporary intelligence theory and instruments?  Should the courts continue to handicapped by the presentation of intelligence test results that are not based on the best evidence from intelligence theory, research, and test development?  The courts are at the mercy of experts who testify, experts who I believe need to be familiar with the cutting edge empirical and theoretical information on the structure of human intelligence and various IQ batteries that are available, beyond the Wechslers.

Given the data presented above, it is possible that the decisions in at least two cases (and I'm sure there are more), may have had a different outcome, or at least an outcome based on a more comprehensive set of intelligence information.  Justice could have been better served via more contemporary intellectual testing practice and interpretation.

I continue to be troubled by this issue.....I need to stop writing and reflect...and will post more on it in the future.

Stay tuned...this series may continue as I analyze other data sets.

