What many layperson's do not understand, and unfortunately also some psychologists, is that different IQ test batteries are composite scores often based of different mixes of cognitive abilities. That is, many IQ tests may measure some of the same abilities (Tests A and B both measure apples), but test A may measure some other abilities (oranges) not measured by Test B. Conversely Test B may measure some abilities (bananas) not measured by Test A. And, even when two IQ test batteries have tests of similar abilities (apples), they may measure the abilities slightly differently or may measure a different subset or variety of abilities within the same ability domain (red delicious vs golden delicious).
Also, there is a significant difference between comprehensive IQ batteries (Wechsler's; Stanford Binet; Woodcock-Johnson; etc.) and special-purpose batteries that are deliberately designed to measure narrower and more restricted set of human abilities. Nonverbal IQ tests (actually, "tests that measure abilities via non-verbal methods" is a more accurate description---there is no such ability as "non-verbal" ability or IQ). Nonverbal tests are frequently used when a person comes from a different culture and has limited understanding of English language. The so-called nonverbal tests attempt to tap key aspects of a person's cognitive abilities via the use of directions and response formats that impose minimal or no language demands on the examinee (e.g., directions administered via gestures or pantomime).
Lets use two scores from the Geuevara cast to illustrate.
First, in order to determine if cognitive ability content coverage may explain all (or a part) of total IQ score differences, one needs a taxonomy to categorize the abilities measured by different IQ tests. As I've blogged about repeatedly, the Cattell-Horn-Carroll (CHC) theory of cognitive abilities is now considered the consensus psychometric taxonomy of human abilities (McGrew, 2009). Using the CHC taxonomy, I examined the type and amount of different CHC abilities (fruits) measured by the two primary IQ measures administered to Guevara.
On the Spanish version of the Woodcock Johnson-Revised cognitive battery (Bateria-R; BAT-R). Guevara obtained a total composite IQ score of 60 (+- 5 point confidence band: 55-65). On the Test of Nonverbal Intelligence-2nd Edition (TONI-2), a score of 77 (+- 5 point confidence band: 72-82). As one can see, even after accounting for unreliability in measurement via the application of the standard error of measurement (SEM) to the scores, the two respective confidence bands do not overlap (55-65 v 72-82). If the IQ score confidence bands did overlap, one would then assume that the difference in IQ scores is not a reliable difference and reflects the known degree of measurement error in each batteries score. However, in the current illustration,the bands do not overlap. There is a statistically significant reliable difference between the BAT-R and TONI-2 scores.
Using the recognized research-based extant CHC IQ test analysis literature (Flanagan, McGrew, & Ortiz, 2000; Flanagan, Ortiz, & Alfonso, 2007; McGrew & Flanagan, 1998; Woodcock, 1990 ), I developed the following figure. I call these figures the "IQ Test CHC DNA Fingerprints" for each intelligence battery. In the current figure I have superimposed the CHC IQ Test DNA Fingerprints for both the BAT-R and TONI-2. It should be immediately clear that the BAT-R and TONI-2 IQ scores are composed of dramatically different mixtures of cognitive abilities (different mixtures of fruit). The BAT-R, which is grounded in CHC theory, provides a total composite IQ score based on a combination of seven different tests that each measure a different CHC ability domain. Each contributes 14.3% to the composite IQ score. Conversely, the TONI-2 is a special purpose nonverbal battery that consists entirely of a single set of items that only measure one CHC ability domain (i.e., Gf). Clearly one would not expect that all individuals would receive similar total IQ scores from these two different IQ batteries given that people display variability in different CHC abilities (relative strengths and weaknesses). We clearly have a situation where the proverbial "apples and oranges" issue operating.
The psychologist who interpreted the scores indicated that the TONI-2 results were "consistent" with those from the BAT-R. The psychologist was correct.
Below are the obtained standard scores for the seven BAT-R tests, along with their CHC ability designation:
- Memory for Names (Glr): 60
- Memory for Sentences (Gsm): 91
- Visual Matching (Gs): 67
- Incomplete Words (Ga): 70
- Visual Closure (Gv): 67
- Picture Vocabulary (Gc): 70
- Analysis-Synthesis (Gf): 83
It should be obvious that one cannot directly compare a narrowly focused special purpose IQ score that only measures Gf (100%) to a composite IQ score that allows Gf to contribute 14.3 % to the composite score....and which also includes 14.3% contributions from 6 other human cognitive ability domains (Ga, Gc, Gs, Gv, Glr, Gsm). In this case we are comparing a measure of one type of fruit (TONI-2; Gf) to a measure that is a more complete produce market of fruits (BAT-R; Gf, Gv, Ga, Gsm, Glr, Gs, Gc)
So which is more accurate or valid? If one assumes that both tests were administered properly and the examinee understood all tests, the test battery that provides for the measurement of a more comprehensive range of CHC abilities should take precedence as the most valid and reliable indicator of the person's general intellectual ability.
When one compares the same "apples" (Gf coverage measures) from the BAT-R and TONI-2, the test results are "consistent" and validate one another.
Bottom line: One must be cautious when comparing total IQ scores from different IQ batteries, especially when one is a comprehensive IQ battery and the other is a narrowly focused special-purpose battery that measures only 1 (or maybe 2-3) CHC ability domains. Although not as dramatic as in this comparison, even total IQ scores from different comprehensive IQ batteries can produce different total scores if the total IQ score is "flavored" with different mixtures of abilities (fruits). This can even be seen in differences in scores between different editions of the same test that has changed its coverage of measured abilities across editions (see CHC analysis of evolving CHC content of Wechsler intelligence batteries across time, which are presented in the same IQ Test CHC Test DNA Fingerprint format.
I will soon be posting IQ Test CHC Test DNA Fingerprints for all the major IQ batteries and select special purpose batteries. Stay tunned.
Technorati Tags: psychology, forensic psychology, forensic psychiatry, neuropsychology, intelligence, school psychology, educational psychology, IQ, IQ tests, IQ scores, intellectual disability, mental retardation, MR, ID, criminal psychology, criminal defense, ABA, American Bar Association, Atkins cases, death penalty, capital punishment, AAIDD, Guevara v Thaler, Guevara v Texas, BAT-R, TONI-2, IQ score differences, CHC theory, Cattell-Horn-Carroll intelligence theory, IQ Test CHC DNA Fingerprints, psychometrics