Applying the item response theory with two-parameter, three-parameter models in the evaluation of multiple choice tests

Van Canh Nguyen1,
1 Office of Quality Assurance, Dong Thap University, Vietnam

Main Article Content

Abstract

The article presents the results of analyzing and evaluating multiple-choice items based on Item Response Theory (IRT) with two-parameter and three-parameter models through analysis results of data from R software (package ltm). Data in this study are the results of answering 50 multiple-choice items of 590 students who took the English 1 test organized at Dong Thap University in 2018. By evaluating each multiple-choice item based on their difficulty, discrimination parameters and guessing parameter according to the models, the study has identified good items to put into item bank, and point out items that are not really optimal, thus should continue to be considered before being put into use. The review and analysis of multiple-choice items based on both models help evaluate items more comprehensive and item selection more accurate. In addition, the research results show that if the evaluation of the test is only based on the subjective opinions of professional lecturers, not on the process of analyzing and evaluating based on IRT, the not good items could be introduced into the test without being detected.

Article Details

References

Baker, F. B. (2001). The Basics of Item Response Theory, ERIC Clearinghouse on Assessment and Evaluation, College Park, MD.
Bui, N. Q. (2017). Evaluation of the quality of multiple choice test bank for the module of Introduction to Anthropology by using the RASCH model and QUEST software. Science of Technology Development, 20 (X3), 42-54.
Bui, A. K., & Bui, N. P. (2018). Using IATA to analyze, evaluate and improve the quality of the multiple-choice items in chapter power functions, exponential functions and logarithmic functions. Can Tho University Journal of Science, 54(9C), 81-93.
Doan ,H. C., Le, A. V., & Pham, H. U. (2016). Applying three-parameter logistic model in validating the level of difficulty, discrimination and guessing of items in a multiple choice test. Ho Chi Minh city University of Education Journal of Science, 7(85), 174-184.
Duong, T. T. (2005). Test and measure academic achievement. Hanoi: Social Sciences Publishing House.
Lam, Q. T., Lam, N. M., Le, M. T., & Vu, D. B. (2007). VITESTA software and analysis of test data. Vietnam Journal of Education, 176, 10-12.
Lam, Q. T. (2011). Measurement in Education - Theory and Application. Hanoi: Vietnam National University Publishing House.
Le, A. V., Pham, H. U., Doan, H. C., & Le, T. H. (2017). Using Gibbs Sampler to evaluate item difficulty in Rasch model. Ho Chi Minh city University of Education Journal of Science, 14(4), 119-130.
Rizopoulos, D. (2006). An R package for latent variable modeling and item response theory analysis. Journal of Statistical Software, 17(5) 1-25.
Nguyen, B. H. T. (2008). Using Quest software to analyze objective test questions. Journal of Science and Technology, Da Nang University, 2, 119-126.
Nguyen, P. H. (2017). Using GSP chart and ROC method to analyze and select mutiple choice items. Dong Thap University Journal of Science, 24 (2), 11-17.
Nguyen, P. H., & Du, T. N. (2015). The analysis and selection of objective test items based on S-P chart, Grey Relational Analysis, and ROC curve. Ho Chi Minh City University of Education Journal of Science, 6(72), 163-173.
Nguyen, T. H. M., & Nguyen, D. T. (2006). Measurement Assessment in the objective test: Question difficulty and Examinees’ ability. Vietnam National University Journal of Science, 4, 34-47.
Nguyen, V. C., & Nguyen, Q. T. (2020). Applying ConQuest software with the two-parameter IRT model to evaluate the quality of multiple-choice test. HNUE Journal of Science, 65(7), 230 – 242.
Pham, T. M., & Bui, D. N. (2019). The IATA software for analyzing, evaluation of multiple-choice questions at Ha Noi Metropolitan University. Scientific Journal of Ha Noi Metropolitan University, 20, 97-108.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: Danish Institute for Educational Research.