Monday, August 25, 2025

IQs Corner: What is (and what is not) clinical judgment in intelligence test interpretation?

What is clinical judgment in intelligence testing?  

This term is frequently invoked when psychologists explain or defend their intelligence test interpretations.  Below is a brief explanation I’ve used to describe what it is…and what it is not, based on several sources.  Schalock and Luckasson’s AAIDD Clinical Judgment book (now in a 2014 revised version) is the best single source I have found that addresses this slippery concept in intelligence testing, particularly in the context of a potential diagnosis of intellectual disability (ID)—it is a recommended reading.

—————

Clinical judgment is a process based on solid scientific knowledge and is characterized as being “systematic (i.e., organized, sequential, and logical), formal (i.e., explicit and reasoned), and transparent (i.e., apparent and communicated clearly)” (Schalock & Luckasson, 2005, p.1). The application of clinical judgment in the evaluation of IQ scores in the diagnosis of intellectual disability includes consideration of multiple factors that might influence the accuracy of an assessment of general intellectual ability (APA: DSM-5, 2013).  The “unanimous professional consensus that the diagnosis of intellectual disability requires comprehensive assessment and the application of clinical judgment” (Brief of Amici Curiae American Psychological Association, American Psychiatric Association, American Academy of Psychiatry and the Law, Florida Psychological Association, National Association of Social Workers, and National Association of Social Workers Florida Chapter, in Support of Petitioner; Hall v. Florida; S.Ct., No. 12-10882; 2014; p. 8).

The misuse of clinical judgment in the interpretation of scores from intelligence test batteries should not be used as the basis for “gut instinct” or “seat-of-the-pants” impressions and conclusions of the assessment professional (Macvaugh & Cunningham, 2009), or justification for shortened evaluations, a means to convey stereotypes or prejudices, a substitute for insufficiently explored questions, or an excuse for incomplete testing and missing data (Schalock & Luckasson, 2005). Idiosyncratic methods and intuitive conclusions are not scientifically based and have unknown reliability and validity. 

If clinical judgment interpretations and opinions regarding an individual’s level of general intelligence are based on novel or emerging research-based principles, the assessment professional must document the bases for these new interpretations as well as the limitations of these principles and methods. This requirement is consistent with the Standards for Educational and Psychological Testing Standard 9.4 which states:

When a test is to be used for a purpose for which little or no validity evidence is available, the user is responsible for documenting the rationale for the selection of the test and obtaining evidence of the reliability/precision of the test scores and the validity of the interpretations supporting the use of the scores for this purpose (p. 143).


American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (2014).  Standards for educational and psychological testing.  Washington, DC:  Author. 

American Psychiatric Association (2013). Diagnostic and statistical manual of mental disorders— Fifth Edition. Washington D.C.:  Author. 

Brief of Amici Curiae American Psychological Association, American Psychiatric Association, American Academy of Psychiatry and the Law, Florida Psychological Association, National Association of Social Workers, and National Association of Social Workers Florida Chapter, in Support of Petitioner; Hall v. Florida; S.Ct., No. 12-10882; 2014; p. 8.

MacVaugh, G. S. & Cunningham, M. D. (2009). Atkins v. Virginia: Implications and recommendations for forensic practice.  The Journal of Psychiatry and Law, 37, 131-187.

Schalock, R. L. & Luckasson, R. (2005). Clinical judgment. Washington, DC: American Association on Intellectual and Developmental Disabilities. 

—————

Kevin S. McGrew, PhD.

Educational Psychologist

Director 

Institute for Applied Psychometrics (IAP)

www.theMindHub.com


Saturday, April 5, 2025

Reevaluating the Flynn effect, and the reversal: Temporal trends and measurement invariance in Norwegian armed forces intelligence scores

Reevaluating the Flynn effect, and the reversal: Temporal trends and measurement invariance in Norwegian armed forces intelligence scores

Open access PDF available from journal Intelligenceclick here.

Abstract

Since 1954, the Norwegian Armed Forces have annually administered an unchanged general mental ability test to male cohorts, comprising figure matrices, word similarities, and mathematical reasoning tests. These stable and representative data have supported various claims about shifts in general mental ability (GMA) levels, notably the Flynn effect and its reversal, influencing extensive research linking these scores with health and other outcomes. This study examines whether observed temporal trends in scores reflect changes in latent intelligence or are confounded by evolving test characteristics and specific test-taking abilities in numerical reasoning, word comprehension, and figure matrices reasoning. Our findings, using multiple-group factor analysis and multiple indicator multiple cause (MIMIC) models, indicate that while there was a general upward trend in observed scores until 1993, this was predominantly driven by enhancements in the fluid intelligence task, specifically figure matrices reasoning. Notably, these gains do not uniformly translate to a rise in underlying GMA, suggesting the presence of domain-specific improvements and test characteristic changes over time. Conversely, the observed decline is primarily due to decreases in word comprehension and numerical reasoning tests, also reflecting specific abilities not attributable to changes in the latent GMA factor. Our findings further challenge the validity of claims that changes in the general factor drive the Flynn effect and its reversal. Furthermore, they caution against using these scores for longitudinal studies without accounting for changes in test characteristics.