Showing posts with label psychometrics. Show all posts
Showing posts with label psychometrics. Show all posts

Thursday, July 12, 2018

Great psychometric resource: The Wiley Handbook of Psychometric Testing

I just received my two volume set of this excellent resource on psychometric testing.  There are not many good books that cover such a broad array of psychometric measurement issues.  This is not what I would call "easy reading."  This is more like a "must have" resource book to have "at the ready" when seeking to understand contemporary psychometric test development issues.

Friday, August 24, 2012

IQ Score Interpretations in Atkins MR/ID Death Penalty Cases: The Good, Bad and the Ugly

I just uploaded the following PPT presentation to my SlideShare account---IQ Score Interpretation in Atkins MR/ID Death Penalty Cases: The Good, Bad and the Ugly. It was presented this month (Sept, 2012) at the Habeas Assistance Training Seminar. Click here to view.




Posted using BlogPress from Kevin McGrew's iPad
www.themindhub.com

Friday, January 13, 2012

How to estimate best IQ score if someone has taken multiple IQ tests: The psychometric magic of Dr. Joel Schneider

Dr. Joel Schneider has posted an excellent explanation on how to estimate a person's "true IQ score" when a person has taken multiple IQ tests at different times. Probably the most important take-away message is one should never calculated the simple arithmetic average. The median would be more appropriate, but Joel provides and even more psychometrically sound method and an Excel spreadhsheet for implementing his excellent logic and methods.



- Posted using BlogPress from Kevin McGrew's iPad

Wednesday, January 11, 2012

Tuesday, October 4, 2011

Dr. Doug Detterman's bytes: Psychometric validity




I have been remiss (busy) in my posting of Dr. Doug Detterman's bytes. Here is a new one on validity

Validity is the extent to which a test measures what it is supposed to measure and predicts what it is supposed to predict. When Binet developed his intelligence test, his goal was to identify children who would not do well in school so they could be given help. To the extent that Binet's test identified such children, it was valid. In Binet's case, proving the validity of the test amounted to showing that the test predicted or correlated with school performance. (Binet was handicapped, though, since the correlation coefficient was not widely known at the time of his first test.) Note that there is no requirement to provide an explanation of why the test predicts what it was designed to predict, only that it do it. Validity provides an empirical relationship that may be absent of any theoretical meaning. Theoretical meaning is given to the relationship when people attempt to explain why the test works to produce this validity relationship.

Tests designed to predict one thing may be found to predict other things. This is
certainly the case with intelligence tests. Relationships between intelligence and many other variables have been found. Such relationships help to build a theory about how and why the test works and ultimately about the relationship of the variables studied.


- iPost using BlogPress from Kevin McGrew's iPad

Generated by: Tag Generator


Tuesday, September 20, 2011

IRT-based clinical psychological assessment and test development




IRT based test development has been one of the most important psychometric developments during the past few decades.

This is a followup to a prior brief FYI post about an excellent review article regarding the benefits of IRT methods for psychological test development and interpretation. I have now read the article in depth and have provided additional comments and links via the IQs Reading blog feature.

Enjoy.


- iPost using BlogPress from Kevin McGrew's iPad

Thursday, September 15, 2011

Book Nook: An Introduction to Psychometrics




People always ask me for recommendations for a good intro book on psychometrics. Until recently, there have been no such books. There are older texts by Thorndike and Nunally, and a boat load of highly topic-specific advanced books (IRT; factor analysis, etc.), but few books suitable for a first course in psychometrics.

I recently ordered the above book and have been skimming sections when I find time. I believe that this is probably one of the better contemporary introductory texts on psychometrics. I would recommend it to anyone wanting to learn more about the basics of psychometrics.


- iPost using BlogPress from Kevin McGrew's iPad

Psychometric issues in Atkins MR/ID cases: The IAP AP101 report series




It is clear that many Atkins MR/ID death penalty decisions revolve around psychometric issues that many (but not all) attorneys, judges, and psychologists (who do assessments) are not well versed. Recurring issues in many cases are norm obsolescence (Flynn Effect), standard error of measurement (SEM), practice effects, full-scale vs component part scores, differences between IQs from different tests, to name but a few.

I always think that once I've posted an Applied Psychometrics 101 working paper, all who visit the ICDP blog will easily find the materials. But, I still receive regular phone calls and questions suggesting that my assumption is not correct.

Thus, the purpose of this post is to remind those looking for some information on some of the recurring psychometric issues that a series of reports are available for download. They are listed on the right side of the blogroll, as can bee seen in the image below.(double click on image to enlarge)


To make this info accessible again, I have created a link here that when clicked will provide the readers with all blog posts that reference the reports and provide report-specific links.

I have a couple of new reports "in limbo" and hope to find time in the near future to add more.




- iPost using BlogPress from Kevin McGrew's iPad

Friday, July 15, 2011

Intelligent IQ testing: Joel Schneider on proper interpretation of composite/cluster scores







Dr. Joel Schneider has (again) posted an amazing and elegant video tutorial to help individuals who engage in intelligence test interpretation understand whether composite/cluster scores should be interpreted as valid when the individual subtests comprising the composite are significantly different or discrepant (according to Dr. Schneider--"short answer: not very often"). It is simply AWESOME...and makes me envious that I don't have the time or skills to develop similar media content.

His prior and related video can be found here.

Clearly the message is that the interpretation of test scores is not simple and is clearly a mixture of art and science. As Tim Keith once said in a journal article title (1997)...."Intelligence is important, intelligence is complex." This should be modified to read "intelligence is important, intelligence is complex, and intelligent intelligence test interpretation is also complex."


- iPost using BlogPress from my Kevin McGrew's iPad

Generated by: Tag Generator


Wednesday, June 1, 2011

Influences to consider when interpreting ability test scores: Dr. Schneider amazing video tutorial

Double click on image to enlarge.



Dr. Joel Schneider has (again) posted an amazing video tutorial explaining the various kinds of influences on specific test ability scores. It is simply AWESOME...and makes me envious that I don't have the time or skills to develop similar media content.

Clearly the message is that the interpretation of test scores is not simple and is clearly a mixture of art and science.


- iPost using BlogPress from my Kevin McGrew's iPad

Thursday, March 31, 2011

Why IQ composite scores often are higher or lower than the subtest scores: Awesome video explanation

This past week Dr. Joel Schneider and I released a paper called " 'Just say no' to averaging IQ subtest scores." The report generated considerable discussion on a number of professional listservs.

One small portion of the paper explained why composite/cluster scores from IQ tests often are higher (or lower) than the arithmetic mean of the tests that comprise the composite. This observation often baffles test users.

I would urge those who have ponder this question to read that section of the report. And THEN, be prepared to be blown away by an instructional video Joel posted at his blog where he leads you through a visual-graphic explanation of the phenomena. Don't be scared by the geometry or some of the terms. Just sit back and relax and now recognize, even if all the technical stuff is not your cup-of-tea, that there is an explanation for this score phenomena. And when colleagues ask, just refer them to Joel's blog.

It is brilliant and worth a view, even if you are not a quantitatively oriented thinker.

Below is a screen capture of the start [double click on icon to enlarge]



- iPost using BlogPress from my Kevin McGrew's iPad

Sunday, March 27, 2011

IAP Applied Psychometrics 101 Report #10: "Just say no" to averaging IQ subtest scores

Should psychologists engage in the practice of calculating simple arithmetic averages of two or more scaled or standard scores from different subtests (pseudo-composites) within or across different IQ batteries? Dr. Joel Schneider and I, Dr. Kevin McGrew say "no."

Do psychologists who include simple pseudo-composite scores in their reports, or make interpretations and recommendations based on such scores, have a professional responsibility to alert recipients of psychological reports (e.g., lawyers, the courts, parents, special education staff, other mental health practitioners, etc.) of the potential amount of error in their statements when simple pseudo-composite scores are the foundation of some of their statements? We believe "yes."

Simple pseudo-composite scores, in contrast to norm-based scores (i.e., composite scores with norms provided by test publishers/authors--e.g., Wechsler Verbal Comprehension Index), contain significant sources of error. Although they have intuitive appeal, this appeal cloaks hidden sources of error in the scores---with the amount of error being a function of a combination of psychometric variables.

IAP Applied Psychometrics 101 Report #10 addresses the psychometric issues involved in pseudo-composite scores.

In the report we offer recommendations and resources that allow users to calculate psychometrically sound pseudo-composites when they are deemed important and relevant to the interpretation of a person's assessment results.

Finally, understanding the sources of error in simple pseudo-composite scores provides an opportunity for practitioners to understand the paradoxical phenomenon frequently observed in practice where norm-based or psychometrically sound pseudo-composite scores are often higher (or lower) than the subtest scores that comprise the composite. The "total does not equal the average of the parts" phenomenon is explained conceptually, statistically, and via an interesting visual explanation based on trigonometry.



Abstract

The publishers and authors of intelligence test batteries provide norm-based composite scores based on two or more individual subtests. In practice, clinicians frequently form hypotheses based on combinations of tests for which norm-based composite scores are not available. In addition, with the emergence of Cattell-Horn-Carroll (CHC) theory as the consensus psychometric theory of intelligence, clinicians are now more frequently “crossing batteries” to form composites intended to represent broad or narrow CHC abilities. Beyond simple “eye-balling” of groups of subtests, clinicians at times compute the arithmetic average of subtest scaled or standard scores (pseudo-composites). This practice suffers from serious psychometric flaws and can lead to incorrect diagnoses and decisions. The problems with pseudo-composite scores are explained and recommendations made for the proper calculation of special composite scores.


- iPost using BlogPress from my Kevin McGrew's iPad

Generated by: Tag Generator





Saturday, January 1, 2011

Dr. Doug Detterman's bytes: Psychometric reliability




Another of Dr. Doug Detterman's intelligence bytes.

Reliability is consistency. A measure is reliable if it provides the same measurement on repeated applications. A measurement is an attempt to estimate the value of a true score or latent trait. If it were possible to measure this true score or latent trait value exactly, the measurement would provide the same value on each measurement occasion so long as the trait remains unchanged. However, measurement is never perfect. There will always be some error. To understand the accuracy of any measure requires knowing the amount of error in the measurement.

One of the reasons so many important relationships have been found with intelligence is that they are highly reliable.

All good science begins with reliable measurement. As Pavlov put it, control
your conditions, and you will see order. This is why reliability is so important and
probably deserves even more attention than it was given here
.

- iPost using BlogPress from my Kevin McGrew's iPad


Saturday, November 27, 2010

Visual-graphic of how to develop psychological measures of constructs

I found this figure, which I had developed a few years ago for a specific grant process (thus the scratched out box that is not relevant to this post), which summarizes in a single figure the accepted/recommended approach to developing and validating tests. In simple terms, one starts with the specification of the theoretical domain construct(s) of interest, then examines the measurement domain for possible types of tests to operationalize the constructs, and then one develops and scales the test items (optimally using IRT scaling methods) Very basic. Thought I would share---I love visual-graphic explanations.

Double click on image to enlarge





- iPost using BlogPress from my Kevin McGrew's iPad


Saturday, November 13, 2010

Creating new psychometrically sound test score composites - the Compositator




Ever wanted to combine two or three tests from an intelligence or achievement battery into a cluster composite not provided by the authors/publisher of a test?

Most assessment clinicians have. Unfortunately, there has been no psychometrically sound method for doing so. As will be described in a future post, calculating the arithmetic average of individual test scaled or standard scores to generate a pseudo-composite is a psychometric no-no that results in inaccurate scores with serious psychometric flaws In order to calculate defensible composite scores from the standard scores from different subtests, users would need to engage in tedious calculations that include information regarding the intercorrelations between all tests for the desired new cluster, the standard deviations of the different tests (unless all are on a common scale), and the individual reliability of each test. Simply collecting the required information from the technical manual from a test battery took too much time.

Enter the Compositator by Dr. Joel Schneider. This has to be seen and played with to be believed!




The FREE Compositator was released yesterday by the Woodcock-Munoz-Foundation. The description at the WMF web page is:

The Compositator, by W. Joel Schneider, Ph.D., is a tool designed to provide assessment professionals who use the Woodcock-Johnson® III Normative Update Compuscore and Profiles Program, version 3.1 (WJ III NU) and the Woodcock Interpretation and Instructional Interventions Program, version 1.0 (WIIIP) with user-friendly statistical methods to create customized composite scores

So, for users of the WJ II NU, there is now a psychometrically defensible method for creating special clusters. But that is only the beginning and first level use of the program. Dr. Schneider has included a wide variety of features that allow users to create factor analysis scores, prediction equations of unique sets of norm-based or Compositator created composites (e.g., examining the predictive power of a set of variables for predicting basic reading skills, etc), etc.

The unique and innovative value of the program for all test professionals (just not those who use the WJ III NU) is that the program is loaded with all kinds of psychometric-based features that include instructional videos explaining how they work and the underlying basic statistics. I see the program as serving as a very powerful instructional tool in graduate level intelligence testing courses. Students can do "what if" scenarios with different test variables and see how different decisions result in different psychometric outcomes (e.g., different levels of prediction via multiple regression). Causal modeling scenarios can be simulated and compared. These examples only scratch the surface of the programs features.

The PDF manual associated with the program is worth a read if nothing else...as much can be learned about basic psychometric concepts.

Dr. Schneider's free software is highly innovative. IMHO it is a major innovation in intelligence test interpretation and instruction re: basic psychometric concepts.

Yes----the program only works with the WJ III NU. But it is FREE. WMF worked with Dr. Schneider as it saw the need to "push the edge of the envelope" of intelligence test instruction and interpretation. As a result, WMF provided Dr. Schneider access to all the electronic WJ III NU data files that provided the necessary test characteristic information required (as noted in the introduction to this post) to create psychometrically defensible composite scores. The intent is to pull the field of intelligent intelligence testing forward----with other test authors and publishers viewing the concepts embedded in the Compositator as ideas for incorporation in their respective test batteries and software.

The Compositator should be considered a prototype that will serve as a potential illustrative "tipping point" for new innovations in intelligence test interpretation and the teaching of current and future users of intelligence tests about basic psychometric concepts. The goal is to influence the field beyond the WJ III NU. The WJ III NU psychometric guts provided a means for Dr. Schneider to implement his creative and innovative ideas---hopefully to demonstrate what can and should be done to improve intelligence testing practice across the board.

Of course, being able to create any cluster one wants is not without caveats. Crafting new composite scores should only be based on sound clinical, theoretical or research-based evidence. In this sense the program is a dumb tool that cannot be separated from the expertise and skills of the user----after all, "if you give a monkey a Stradivarius violin and you get bad music--you don't blame the violin."

Finally, Dr. Schneider's unique approach to teaching the basics for using the program can be followed at his new blog (Psychometrics from the ground up) where he explains statistical and psychometric concepts via video tutorials...amazing stuff. Check it out and bookmark it for frequent visits (or put it in your RSS feed reader)

[Conflict of interest disclosure - I, Kevin McGrew, am a coauthor of the WJ III NU. I am also the Research Director for WMF].




- iPost using BlogPress from my Kevin McGrew's iPad

Monday, August 9, 2010

Institute for Applied Psychometrics (IAP) internet resources (web, blogs, etc.) updated/revised

This blog, as well as my other two professional blogs, are activities of the Institute for Applied Psychometrics (IAP).  I'm pleased to announce that this past week I finally found the time to update/revise the IAP web page. 

Aside from updating content, the major revision was the integration and cross-linking of the IAP web page with my three professional blogs.  The web page serves more as the "mother" host of major static material while the three blogs are the IAP's mechanisms (along with Facebook, LinkedIN and Twitter---page links that are now also available at the revised web page) for immediate, dynamic presentation of material.  Collectively all of these internet portals work together to meet the goals of IAP (as outlined below).  The sources are now better integrated via the latest web page revision.  Enjoy

You can access the IAP web page via three methods:

Enjoy
The Institute for Applied Psychometrics (IAP) llc is a private research organization, founded by Kevin McGrew, devoted to the application of educational, psychological, measurement and statistical procedures to issues and problems in psychology, education, and human exceptionalities/disabilities.  The goal of IAP is to provide a bridge between psychological, measurement, and statistical theory/methods and applied practice in psychology, education and law.

IAP has particular research interests in: (a) theories and measurement of human intelligence, personal competence and adaptive behavior, (b) the application of psychological and educational measurement principles and techniques to the development and interpretation of psychological and educational assessment instruments, (c) the Cattell-Horn-Carroll (CHC) Theory of Cognitive Abilities, (d) narrowing the theory-practice gap in educational and psychological assessment, (c) the influence of non-cognitive (conative) characteristics on learning and human performance, (d) psychological assessment practices in the identification and classification of individuals with intellectual and learning disabilities and other exceptionalities, (e) the application of emerging neurotechnologies to learning and cognitive performance, and (f) psychometric issues related to the identification of individuals with intellectual disabilities in Atkins MR/ID death penalty cases.  The practical application of psychometrics to educational, psychological and legal problems is a unique IAP focus.

IAP has conducted research and provided consultation and training on:

  • Human ability measurement as per the Cattell-Horn-Carroll (CHC; Gf-Gc) Theory of Cognitive Abilities.
  • Achievement and cognitive ability test development and interpretation.
  • The practical application of IRT, multivariate statistics, and structural equation modeling (SEM) methods to educational and psychological issues and problems.
  • Research, development, and validation of models of human abilities and competence, particularly in the areas of multiple intelligence's, academic and cognitive skill development, personal competence, adaptive behavior, and community adjustment.
  • The development and measurement of educational and community outcomes for individuals with disabilities.
  • Secondary analysis of large-scale national databases.
  • The development and improvement of educational assessment practices for students with disabilities.
  • The development of strong programs of construct validity for educational and psychological assessment and measurement methods.
  • Recognizing the importance on non-cognitive (e.g., self-regulated learning strategies; self- efficacy; etc) student characteristics in academic learning.
  • Psychometric issues surrounding intelligence testing in federal Atkins MR/ID death penalty cases.
  • Scientific advice to neurotechnology companies (i.e., Interactive Metronome).
  • Education regarding applied psychometric topics.

Technorati Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Wednesday, August 4, 2010

Research Brief 7-4-10: Investigation of prediction bias in WISC-IV

Excellent article that shows how one form of empirically-defined test bias (differential prediction) should be pursued.

Konold, T. R., & Canivez, G. L. (2010). Differential Relationships Between WISC-IV and WIAT-II Scales: An Evaluation of Potentially Moderating Child Demographics. Educational and Psychological Measurement, 70(4), 613-627.

Considerable debate exists regarding the accuracy of intelligence tests with members of different groups. This study investigated differential predictive validity of the Wechsler Intelligence Scale for Children—Fourth Edition. Participants from the WISC-IV—WIAT-II standardization linking sample (N = 550) ranged in age from 6 through 16 years (M = 11.6, SD  = 3.2) and varied by the demographic variables of gender, race/ethnicity (Caucasian, African American, and Hispanic), and parent education level (8-11, 12, 13-15, and 16 years). Full Scale IQ and General Ability Index scores from the WISC-IV were used to predict scores on Mathematics, Oral Language, Reading, Written Language, and the total composite on the Wechsler Individual Achievement Test—Second Edition. Differences in prediction were evaluated between demographic subgroups via Potthoff’s technique. Of the 30 simultaneous tests, 25 revealed no statistically significant between group differences. The remaining statistically significant differences were found to have little practical or clinical influence when effect size estimates were considered. Results are discussed in the context of other ability measures that were previously investigated for differential validity as well as educational implications for clinicians.

Technorati Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Wednesday, June 30, 2010

The Flynn Effect report series: Is the Flynn Effect a Scientifically Accepted Fact? IAP AP101 Report #7




Another new IAP Applied Psychometrics 101 report (#7) is now available.  The report is the second in the Flynn Effect series, a series of brief reports that will define, explain and discuss the validity of the Flynn Effect (click here to access all prior FE related posts at the ICDP blog) and the issues surrounding the application of a FE "adjustment" for scores based on tests with date norms (norm obsolescence), particularly in the context of Atkins MR/ID capital punishment cases.  The abstract for the brief report is presented below.  The report can be accessed by clicking here.

Report # 1 (What is the Flynn Effect) can be found by clicking here.

This report is the second in a series of brief reports the will define, explain, and summarize the scholarly consensus regarding the validity of the Flynn Effect (FE). This brief report presents a summary of the majority of FE research (in tabular form of n=113 publications) which indicates (via a simple “vote tally” method) that despite no consensus regarding the possible causes of the FE, it is overwhelming recognized as a fact by the scientific community. The series will conclude with an evaluation of the question whether a professional consensus has emerged regarding the practice of adjusting dated IQ test scores for the Flynn Effect, an issue of increasing debate in Atkins MR/ID capital punishment hearings.

Technorati Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Tuesday, June 29, 2010

The Flynn Effect report series: What is the Flynn Effect: IAP AP101 Report #6

A new IAP Applied Psychometrics 101 report (#6) is now available.  The report is the first in the Flynn Effect series, a series of brief reports that will define, explain and discuss the validity of the Flynn Effect (click here to access all prior FE related posts at the ICDP blog) and the issues surrounding the application of a FE "adjustment" for scores based on tests with date norms (norm obsolescence), particularly in the context of Atkins MR/ID capital punishment cases.  The abstract for the brief report is presented below.  The report can be accessed by clicking here.
Norm obsolescence is recognized in the intelligence testing literature as a potential source of error in global IQ scores.  Psychological standards and assessment books recommend that assessment professionals use tests with the most current norms to minimize the possibility of norm obsolescence spuriously raising an individual’s measured IQ.  This phenomenon is typically referred to as the Flynn Effect.  This report is the first in a series of brief reports the will define, explain, and summarize the scholarly consensus regarding the validity of the Flynn Effect.  The series will conclude with an evaluation of the question whether a professional consensus has emerged regarding the practice of adjusting dated IQ test scores for the Flynn Effect, an issue of increasing debate in Atkins MR/ID capital punishment hearings.

Technorati Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,