Shun-Wen Chang, National Taiwan Normal University
Bradley A. Hanson, ACT, Inc.
Deborah J. Harris, ACT, Inc.
Paper presented at the Annual Meeting of the American Educational Research Association (New Orleans, April, 2000)
Abstract: The requirement of large sample sizes for calibrating items based on IRT models is not easily met in many practical pretesting situations. Although classical item statistics could be estimated with much smaller samples, the values may not be comparable across different groups of examinees. This study presented and evaluated a method of standardization that may be used by test practitioners to standardize classical item statistics when sample sizes are small. The effectiveness of this standardization approach was compared with the 1PL and 3PL models based on the criteria of the Pearson product-moment correlation, the MSE, variance and squared bias.
In light of estimating the item difficulty values, the differences of the performance between the 3PL and standardization methods were small, but the differences between the 1PL and these two methods were large. For the estimation of point biserial correlations, the 3PL model seemed to perform better than the standardization method, and the standardization method performed better than the 1PL model. Although the standardization method did not outperform the 3PL model for the design considered in this study, it could be promising when smaller sample sizes are used. This method may be recommended for use in conjunction with the IRT models for the test development when the pretesting sample sizes are small. By employing the classical measurement framework to obtain pretest item statistics, the problem of inaccurate IRT parameter estimates when limited calibration sample sizes are available can be avoided.
Download paper in PDF format (72 KB). Version 4.0 or later of Adobe Acrobat Reader (which is available for free) is needed to view this paper.
Brad Hanson's Home Page