STUDIES CONCLUDE PERF-BASED TESTS MORE EXPENSIVE TO DO WORSE JOB 2 studies conclude multiple choice tests offer better results at less cost and time. Lukhele, R., Thissen., D. & Wainer, H. (1994). On the relative value of multiple-choice, constructed response, and examinee-selected items on two achievement tests. Journal of Educational Measurement, 31(3), 234-250. "Overall, the multiple-choice items provide more than twice the information than the constructed response items do. Examining the entire test (and freely applying the Spearman-Brown prophesy formula), we found that a 75-minute multiple-choice test is as reliable as a 185-minute test built of constructed response questions. Both kinds of items are measuring essentially the same construct, and the constructed response items cost about 300 times more to score. It would appear, based on this limited sample of questions, that there is no good measurement reason for including constructed response items (p. 240)." Wainer, H. & Thiessen, D. (1993) Combining multiple-choice and constructed-response test scores: Toward a Marxist theory of test construction. Applied Measurement in Education 6(2), 103-118. "A natural conclusion to reach from the weightings associated with constructed-response versus multiple-choice questions is that the former take more examinee time and resources to measure essentially the same thing more poorly than the latter." Date sent: Mon, 11 Oct 1999 08:07:17 -0400 To: 71524.2205@compuserve.com From: Donna Garner (by way of Fred Battey ) Subject: ANALYSIS OF TWO TEST TYPES From: George K. Cunningham Professor, University of Louisville October 10, 1999 There are two scholarly articles which are both based on a careful analysis of the Advanced Placement tests which, of course, include both multiple-choice and constructed response items. Based on costs, reliability, and validity, they show the extent to which the debate about which item type to use is a no- brainer. Here is a quote from a chapter I have written which refers to the study and here is the reference to the first study. Lukhele, R., Thissen., D. & Wainer, H. (1994). On the relative value of multiple-choice, constructed response, and examinee-selected items on two achievement tests. Journal of Educational Measurement, 31(3), 234-250. Multiple-choice and constructed response items are compared in an article published in the Journal of Educational Measurement. The article was written by R. Lukhele, David Thissen, and Howard Wainer and titled "On the relative value of multiple-choice, constructed-response, and examinee-selected items on two achievement tests." (1994). The test they chose to study was the Advanced Placement (AP) testing Program of the College Board. This test is a good choice because the training of their examiners and the sophistication of the scoring methods used with the constructed response items on this test are "state of the art." Whatever defects the study uncovers in the scoring of the constructed response items cannot easily be attributed to flaws in the training of examiners or the methods they employed. The article begins by making two important points: constructed response items are very expensive and the information that can be obtained from these items is not very different from what can be obtained from multiple choice items. The authors state: "Constructed response items are expensive. They typically require a great deal of time for the examinee to answer, and they cost a lot to score. In the AP testing program, it was found that a constructed response test of equivalent reliability to a multiple-choice test takes from 4 to 40 times as long to administer and is typically hundreds of thousands of times more expensive to score (p. 234)." With respect to the uniqueness of the information provided by constructed-response items they state: "The primary motivation for the use of constructed response formats thus stems from the idea that they can measure traits that cannot be tapped by multiple-choice items-for example, assessing dynamic cognitive processes (p. 235)." Their conclusion was as follows: "Overall, the multiple-choice items provide more than twice the information than the constructed response items do. Examining the entire test (and freely applying the Spearman-Brown prophesy formula), we found that a 75-minute multiple-choice test is as reliable as a 185-minute test built of constructed response questions. Both kinds of items are measuring essentially the same construct, and the constructed response items cost about 300 times more to score. It would appear, based on this limited sample of questions, that there is no good measurement reason for including constructed response items (p. 240)." "On the basis of the data examined, we are forced to conclude that constructed response items provide less information in more time at greater cost than do multiple-choice items. This conclusion is surely discouraging to those who feel that constructed response items are more authentic and hence, in some sense, more useful than multiple-choice items (p. 245)." There is a second article which in some ways is even better: Wainer, H. & Thiessen, D. (1993) Combining multiple-choice and constructed-response test scores: Toward a Marxist theory of test construction. Applied Measurement in Education 6(2), 103-118. In this article the authors point out the incredibly high cost of constructed response vs. multiple-choice tests. They also describe the difficulty (impossibility) of rationally combining the results of the two item formats. Here are a couple quotes: "For all the tests listed in Table 1, whatever is being measured by the constructed-response section is measured better by the multiple-choice section. These seven tests were but a sample. We have never found any test that is composed of an objectively and a subjectively scored section for which this not true. We invite readers to produce counter evidence. " "A natural conclusion to reach from the weightings associated with constructed-response versus multiple-choice questions is that the former take more examinee time and resources to measure essentially the same thing more pooly than the latter." George K. Cunningham University of Louisville