An attempt to provide understandable and up-to-date information regarding intelligence testing, intelligence theories, personal competence, adaptive behavior and intellectual disability (mental retardation) as they relate to death penalty (capital punishment) issues. A particular focus will be on psychological measurement, statistical and psychometric issues.
As described in the first post, the g-loadings (of the tests or composite scores in an IQ battery) on the first principal-component in principal component analysis (PCA) is a traditional index of the g-ness (aka., saturation of general intellectual ability) of a measure. Furthermore, g-loadings calculated within a specific intelligence battery only tell you the relative g-ness of measures as defined by that specific collection of measures within that particular IQ battery. It is when one moves to joint-analysis of intelligence batteries (and the more batteries in the analysis the better) that a more accurate picture of a measures g-ness can be determined.
Having access to the mixed LD/normal university young adult sample reported in the WJ IIItechnical manual (conflict of interest note - I'm a coauthor of the WJ III), I just ran a joint PCA on on this sample of 200. A description of the sample and instruments administered, abstracted from the WJ III technical manual (click here for a brief WJ III technical manual bulletin summary) can be found by clicking here. I selected this data set as the adult subjects had all been administered the WAIS-III. In addition, they had also been administered the CHC-based WJ III Tests of Cognitive Ability (WJ III) and the Gf-Gc based Kaufman Adolescent and Adult Intelligence test (KAIT). I analyzed the composite scores from the three batteries via the PCA procedures conceptually described in the first post in this series. Below is a summary of the results.
Interestingly, the top four g-measures are from the KAIT (Fluid and Crystallized composites) and the WJ III (Gc=Comprehension-Knowledge; Gf=Fluid Reasoning). Even more interesting was the finding that when a more broad array of cognitive ability composites are included in a g-analysis, the WAIS-III VC (Verbal Comprehension Index; also classified as a strong measure of Gc as per CHC theory) is only the 9th strongest g-measure, and even falls behind the WAIS-III Perceptual Organization (PO; primarily Gv and some Gf as per CHC theory) and Working Memory Indexes (WM--Gsm as per CHC theory).
A hypothesis that has been advanced to explain the differences in how Gc/verbal abilities are measured by the Wechsler verbal scales and Gc/verbal abilities on other cognitive batteries is grounded in Jim Cummins distinction between two types of language proficiency---BICS (Basic Interpersonal Communication Skills) and CALPS (Cognitive Academic Language Proficiency; click here for additional on-line information). Briefly, BICS is language proficiency in more contextualized everyday language contexts while CALP is the more context-reduced conceptual-linguistic knowledge that occures in a context of semantics, and abstractions. CALP is more cognitively demanding. Anyone familiar with the Wechsler verbal subtests of Vocabulary, Comprehension and Similarities knows that they allow for subjects to provide lengthy verbal responses in their own everyday language. In contrast, verbal items on other IQ tests (e.g., WJ III), require one-word responses and tend to focus more on cognitive processing involving language (e.g., antonyms, synonyms, verbal analogies). It has been hypothesized that the Wechsler verbal tests and scales are more BICS-influenced while other IQ tests tend to use verbal/Gc test formats that require more CALP. This might explain the findings reported above--that the WAIS III Verbal Comprehension Index is less cognitively demanding that the Gc/Verbal scales from the WJ III and KAIT.
Why is this finding important? Because in a number of Atkins decisions, where concerns were raised about the person's WAIS-III Full Scale score being a good estimate of the persons g-ness (general intelligence), considerable stock was placed in the Verbal IQ (of which the Verbal Comprehension Index is now a purer factor measure) as being the best estimate of the person's general intelligence (see posts re: Maldonado and Vidal decisions)
Lets also examine these same data through the lens of multidimensional scaling analysis (specifically, Guttman's Radex Model). The radex model statistically classifies measures as per two dimensions--cognitive complexity and stimulus content. As noted in the prior post in this series, cognitively complexity is considered an index of general intelligence (g). For interested readers, a classic article on the use of the MDS radex model in the analysis of intelligence measure was published in Intelligence in 1983 (Marshalek, Lohman & Snow, 1983). Below is a visual-spatial representation of the MDS results in the current university sample, using the same composite measures as reported above (in the PCA g-loading analysis). The interpretations below are mine.
The two broad cognitive processing continuim interpretaions (X- and Y-axis) are not the focal point of the current discussion. The most critical finding in the current context, as per the radex model, is how close to the center of the figure a composite measure score is placed. Measures that are the closest to the center are considered the most cognitively complex. As measures move further away from the center, they are judged to be less cognitively complex.
The results, although using a different method than PCA, produce the same conclusions. The most cognitively complex measure in this sample (which could be thus interpreted as the best index of cognitive complexity or g-ness) is the KAIT Fluid Intelligence Scale. The next closest to the center of the figure are the WJ III Gc (Comprehension-Knowledge) and Gf (Fluid Reasoning) clusters. Again of interest is the location of the WAIS VC composite...it is much less cognitively complex than these other measures, and interestingly, much less cognitively complex than the two similar measures of Gc abilities (WJ III Gc composite; KAIT Crystallized Intelligence composite). I've also provided my stimulus content hypothesis interpretations of groupings of composties (designated by ovals) from across the batteries (e.g., Processing Speed- WAIS Processing Speed and WJ III Gs or Processing Speed).
The findings in this one sample (which therefore warrants caution in generalization), suggests that the WAIS-III Verbal Comprehension composite, which is the most valid measure of Gc or verbal abilities on the WAIS-III, may NOT be all it is thought to be--when it comes to tapping cognitively complex cognitive processing. Other Gc or verbal measures from intelligence batteries with adult norms (KAIT; WJ III) were found, in a relative sense, to be much better indicators of a person's g-ness (general intelligence). The data suggest that, in this sample, even the WAIS-III Working Memory Index score may be a better relative proxy for g-ness than the WAIS verbal composite.
These analyses raise interesting questions about Atkins decisions that have relied either exclusively on the WAIS-R/WAIS-III scores, particularly when part scores (Verbal IQ, Verbal Comprehension; Peformance IQ; Perceptual Organization; etc.) are used instead of the Full-Scale IQ to determine mental retardation, or when the respective WAIS verbal composite is considered better than other potential test global IQ scores (or similar Gc/verbalcomposite scores from other batteries) in making a determination of level of general intellectual functioning.
How can this be? How can a major scale from the "gold standard" of IQ tests (as it is commonly called in Atkins decisions) be a poorer estimate of general intelligence (g-ness) than most psychologists think? More importantly, what are the implications for Atkins decisions, when the WAIS-R/III Full Scale score has been questioned as an accurate g-estimate in the face of considerable profile variability, and then the Verbal IQ/Verbal Comprehension Index is used to estimate g-ness (general intelligence)?
In both the Maldonado and Vidal decisions considerable stock was placed on the respective Wechsler verbal composite scores as being the best indicator of general intelligence (for making a determination of mental retardation). In Maldonado, the reliance of the Wechsler verbal composite trumped a more comprehensive CHC-based IQ battery (BAT-R) administered in Maldonado's native language (Spanish). Of concern in the Vidal decision, is that he had been administered four different versions of the Wechsler IQ batteries (over many decades), and they consistently revealed a large verbal/nonverbal (performance) IQ split. Thus, arguments hinged extensively on the Verbal IQ vs the Full Scale IQ. I'm perplexed why the experts in intelligence and intelligence testing, even without knowing the result of the above g-analysis, did not say "we've got consistent Wechsler V-P split information, I think it would be important to administer more contemporary intelligence tests, or parts of some of these batteries, to find out more information about important g-related cognitive abilities (for the defendant) not measured by the WAIS battery." The Wechsler batteries had consistently captured Vidals abilities as measured per that battery--wouldn't time have been better spent, and a decision made on a higher quality array of cognitive information, by requesting administration of other IQ tests (or parts of other IQ tests) instead of arguing over old and consistent limited cognitive data? In fact, I, together with Flanagan and Ortiz, published a book in 2000 (The Wechsler Intelligence Scales and Gf-Gc theory: A contemporary approach to interpretation) that presents procedures for augmenting the various Wechsler batteries to provide for a more comprehensive CHC/Gf-Gc based assessment of a persons intellectual functioning. This information was also available as early as 1998 (see ITDR by McGrew and Flanagan). [conflict of interest note - I coauthored these two books which made little in the way of ching-$ for the authors. They are now both not being printed and none of the authors are receiving any royalties from their sales].
I continue to be baffled/troubled by the over-reliance, and almost god-like stature of the various versions of the WAIS (R/III/IV), in Atkins rulings. It has been well known (and written about in articles and books; click here) since the early 1990's, that contemporary CHC (aka, Gf-Gc) theory had emerged as the consensus model of intelligence and, more importantly, instruments had been designed (with adult norms) to measure many of the unmeasured or poorly measured CHC abilities not taped by the WAIS-R/III batteries. If I was an attorney arguing an Atkins case, on either side of the fence, I would seek intellectual testing beyond the so-called "gold standard." I would want the best possible estimate of g-ness (since this seems to be the crux of the first prong of MR determination in most Atkins cases).
IMHO, the major problem is that of the "inertia of tradition" in intelligence testing, particularly in psychology disciplines that deal with adult populations. Many practicing psychologists, esp. those working in adult settings whose professional associations and journals have paid less attention to contemporary intelligence theory and test development (less than school and educational psychologists), simply have not kept abreast of these developments.
How long will Atkins expert intelligence testimony, expert debates, and decisions be made in the face of the Atkins MR IQ Theory-Test gap? Isn't this simply wrong? Professionally and ethically shouldn't psychologists who offer judgements in life-or-death decisions hinging on IQ test results be "up to speed" regarding contemporary intelligence theory and instruments? Should the courts continue to handicapped by the presentation of intelligence test results that are not based on the best evidence from intelligence theory, research, and test development? The courts are at the mercy of experts who testify, experts who I believe need to be familiar with the cutting edge empirical and theoretical information on the structure of human intelligence and various IQ batteries that are available, beyond the Wechslers.
Given the data presented above, it is possible that the decisions in at least two cases (and I'm sure there are more), may have had a different outcome, or at least an outcome based on a more comprehensive set of intelligence information. Justice could have been better served via more contemporary intellectual testing practice and interpretation.
Classification Discrepancies in Two Intelligence Tests: Forensic Implications for Persons with Developmental Disabilities
Cavagnaro, AT; Shuster, S; Colwell, K
*JOURNAL OF FORENSIC PSYCHOLOGY PRACTICE*, 13 (1):49-67; JAN 1 2013
Accurate measurement of intellectual abilities of adults with
developmental disabilities impacts key legal issues, including
adjudicative competence, civil commitment, and death penalty litigation.
This research compared standardized measures of intelligence in a
multicultural sample of adults with developmental disabilities. Within
subjects ANOVA revealed significantly higher Wechsler Adult Intelligence
Scale-Third Edition IQs compared to Wide Range Intelligence Test (WRIT)
IQs, with a median difference of 13.0 points. Underestimates provided by
the WRIT could lead to adverse legal decisions, including exacerbation
of malingered cognitive dysfunction cases and permitting individuals
guilty of criminal acts to escape sentences. Policy implications exist
for the methodology of intellectual assessment given that instruments
yield discrepancies. We suggest utilizing standardized measures with
strong psychometric integrity in Atkins hearings and incorporating
relevant collateral information when generating clinical case
formulations. This will give clinicians additional relevant data and
afford greater precision in forming clinical judgments regarding
diagnosis and cognitive level in forensic cases.
Matters of Consequence: An Empirical Investigation of the WAIS-III and WAIS-IV and Implications for Addressing the Atkins Intelligence Criterion
Taub, GE; Benson, N
*JOURNAL OF FORENSIC PSYCHOLOGY PRACTICE*, 13 (1):27-48; JAN 1 2013
"Which test provides the better measurement of intelligence, the
Wechsler Adult Intelligence Scale-Third Edition (WAIS-III) or the
Wechsler Adult Intelligence Scale-Fourth Edition (WAIS-IV)?" is an
important question to professional psychologists; however, it has become
a critical issue in Atkins cases wherein courts are often presented with
divergent Full-Scale IQ (FSIQ) scores on the WAIS-III and WAIS-IV. In
these instances, courts are required to render a decision stating which
test provided the better measure of an inmate's intellectual
functioning. This study employed structural equation modeling to
empirically determine which instrument; the WAIS-III or the WAIS-IV,
provides the better measure of intelligence via the FSIQ score.
Consistent with the publisher's representation of intellectual
functioning, the results from this study indicate the WAIS-IV provides
superior measurement, scoring, and structural models to measure FSIQ
when compared to the WAIS-III.
Refusing and Withdrawing from Forensic Evaluations
Brodsky, SL; Wilson, JK; Neal, TMS
*JOURNAL OF FORENSIC PSYCHOLOGY PRACTICE*, 13 (1):14-26; JAN 1 2013
The current study collected descriptive information about the reasons
mental health experts decline or withdraw from forensic assessments,
both early and late in the legal process. In response to an online
survey, 29 forensic psychologists and psychiatrists presented examples
of case withdrawal from their professional experiences. Their major
reasons included ethical issues or conflicts, payment difficulties, and
interpersonal or procedural problems with retaining counsel or evaluees.
Clearly, there are compelling personal and professional reasons that
prompt forensic mental health experts to withdraw from or turn down
AU Langleben, DD
AF Langleben, Daniel D.
Moriarty, Jane Campbell
TI Using Brain Imaging for Lie Detection: Where Science, Law, and Policy
SO PSYCHOLOGY PUBLIC POLICY AND LAW
AB Progress in the use of functional magnetic resonance imaging (fMRI) of
the brain to differentiate lying from truth-telling has created an
expectation of a breakthrough in the search for objective methods of lie
detection. In the last few years, litigants have attempted to introduce
fMRI-based lie detection evidence in courts. Both the science and its
possible use as courtroom evidence have spawned much scholarly
discussion. This article contributes to the interdisciplinary debate by
identifying the missing pieces of the scientific puzzle that need to be
completed if fMRI-based lie detection is to meet the standards of either
legal reliability or general acceptance. The article provides a balanced
analysis of the current science and the cases in which litigants have
sought to introduce fMRI-based lie detection. Identifying the key
limitations of the science as expert evidence, the article explores the
problems that arise from using scientific evidence before it is proven
valid and reliable. We conclude that the Daubert's "known error rate" is
the key concept linking the legal and scientific standards. We suggest
that properly controlled clinical trials are the most convincing means
to confirm or disprove the relevance of this promising laboratory
research. Given the controversial nature and potential societal impact
of this technology, collaboration of several government agencies may be
required to sponsor impartial and comprehensive clinical trials that
will guide the development of forensic fMRI technology.
AU Bure-Reyes, A
AF Bure-Reyes, Annelly
Puente, Antonio E.
TI Neuropsychological test performance of Spanish speakers: Is performance
different across different Spanish-speaking subgroups?
SO JOURNAL OF CLINICAL AND EXPERIMENTAL NEUROPSYCHOLOGY
AB Even though theories and research have pointed out the importance of
variables such as age, gender, or education on neuropsychological
assessment, much less emphasis has been placed on language and culture.
With the increasing population of Spanish speakers in North America and
the limited amount of clinical and scholarly information currently
available, neuropsychological assessment of this group has similarly
become of increasing importance. Though several studies have been
published over the last two decades, an assumption exists that all
Spanish speakers, holding education and age constant, would perform
similarly regardless of their origin. To address this assumption, a
sample of 126 participants was tested from four different countries
(Chile, Dominican Republic, Puerto Rico, and Spain). Participants were
compared on the following commonly used neuropsychological tests: Verbal
Serial Learning Curve, Rey Osterrieth Complex Figure Test, Verbal
Phonemic Fluency Test, the Stroop Color and Word Test, and the Trail
Making Test. Analyses revealed significant differences across the groups
in two of the five tests administered. Significant differences were
observed in the delayed recall of the Serial Learning Test and in the
Verbal Fluency Test. The findings highlight the importance of
within-group differences between Spanish speakers.
AU Shavelson, RJ
AF Shavelson, Richard J.
TI On an Approach to Testing and Modeling Competence
SO EDUCATIONAL PSYCHOLOGIST
AB E. L. Thorndike contributed significantly to the field of educational
and psychological testing as well as more broadly to psychological
studies in education. This article follows in his testing legacy. I
address the escalating demand, across societal sectors, to measure
individual and group competencies. In formulating an approach to
measuring competence, I draw on measurement research I have done over my
career; the Thorndike lecture is to be as much autobiographical as
substantive and/or methodological. I present an approach to defining,
measuring, and statistically modeling competency measurements. The
article unpacks Hartig etal.'s (2008) definition of competence as a
complex ability construct closely related to real-life-situated
performance. The intent is to make the construct, competence, amenable
to measurement. Once unpacked, criteria for building competence
measurements are set forth and exemplified by research from business,
military, and education sectors. Generalizability theory, a statistical
theory for modeling and evaluating the dependability of competence
scores, is applied to several of these examples. The article then pulls
together the threads into a general competency measurement model and
concludes by noting its limitations.
Dr. Joel Schneider has done it again. A brilliant video tutorial demonstrating how latent factor scores can be used, via Excel templates he provides, to interpret scores on the WISC-IV and WAIS-IV. This is complex material but his beautiful visual video tutorial makes it easier to understand the complex constructs. Dr. Schneider continues to push the envelope on psychometric based IQ test score interpretation.