STUDIES CONCLUDE PERF-BASED TESTS MORE EXPENSIVE TO DO WORSE JOB
2 studies conclude multiple
choice tests offer better results at less cost and time.
Lukhele, R., Thissen., D. & Wainer, H. (1994). On the relative value of
multiple-choice, constructed response, and examinee-selected items on two
achievement tests. Journal of Educational Measurement, 31(3), 234-250.
"Overall, the multiple-choice items provide more than twice the information
than the constructed response items do. Examining the entire test (and
freely applying the Spearman-Brown prophesy formula), we found that a
75-minute multiple-choice test is as reliable as a 185-minute test built of
constructed response questions. Both kinds of items are measuring
essentially the same construct, and the constructed response items cost
about 300 times more to score. It would appear, based on this limited
sample of questions, that there is no good measurement reason for including
constructed response items (p. 240)."
Wainer, H. & Thiessen, D. (1993) Combining multiple-choice and
constructed-response test scores: Toward a Marxist theory of test
construction. Applied Measurement in Education 6(2), 103-118. "A
natural conclusion to reach from the weightings associated with
constructed-response versus multiple-choice questions is that the
former take more examinee time and resources to measure essentially
the same thing more poorly than the latter."
Date sent: Mon, 11 Oct 1999 08:07:17 -0400
To: 71524.2205@compuserve.com
From: Donna Garner (by way of Fred Battey )
Subject: ANALYSIS OF TWO TEST TYPES
From: George K. Cunningham
Professor, University of Louisville
October 10, 1999
There are two scholarly articles which are both based on a careful analysis
of the Advanced Placement tests which, of course, include both
multiple-choice and constructed response items. Based on costs,
reliability, and validity, they show the extent to which the debate about
which item type to use is a no- brainer.
Here is a quote from a chapter I have written which refers to the study and
here is the reference to the first study.
Lukhele, R., Thissen., D. & Wainer, H. (1994). On the relative value of
multiple-choice, constructed response, and examinee-selected items on two
achievement tests. Journal of Educational Measurement, 31(3), 234-250.
Multiple-choice and constructed response items are compared in an article
published in the Journal of Educational Measurement. The article was
written by R. Lukhele, David Thissen, and Howard Wainer and titled "On the
relative value of multiple-choice, constructed-response, and
examinee-selected items on two achievement tests." (1994).
The test they chose to study was the Advanced Placement (AP) testing
Program
of the College Board. This test is a good choice because the training of
their examiners and the sophistication of the scoring methods used with the
constructed response items on this test are "state of the art." Whatever
defects the study uncovers in the scoring of the constructed response items
cannot easily be attributed to flaws in the training of examiners or the
methods they employed.
The article begins by making two important points: constructed response
items are very expensive and the information that can be obtained from
these
items is not very different from what can be obtained from multiple choice
items. The authors state:
"Constructed response items are expensive. They typically require a great
deal of time for the examinee to answer, and they cost a lot to score. In
the AP testing program, it was found that a constructed response test of
equivalent reliability to a multiple-choice test takes from 4 to 40 times
as
long to administer and is typically hundreds of thousands of times more
expensive to score (p. 234)."
With respect to the uniqueness of the information provided by
constructed-response items they state:
"The primary motivation for the use of constructed response formats thus
stems from the idea that they can measure traits that cannot be tapped by
multiple-choice items-for example, assessing dynamic cognitive processes
(p.
235)."
Their conclusion was as follows:
"Overall, the multiple-choice items provide more than twice the information
than the constructed response items do. Examining the entire test (and
freely applying the Spearman-Brown prophesy formula), we found that a
75-minute multiple-choice test is as reliable as a 185-minute test built of
constructed response questions. Both kinds of items are measuring
essentially the same construct, and the constructed response items cost
about 300 times more to score. It would appear, based on this limited
sample of questions, that there is no good measurement reason for including
constructed response items (p. 240)."
"On the basis of the data examined, we are forced to conclude that
constructed response items provide less information in more time at greater
cost than do multiple-choice items. This conclusion is surely discouraging
to those who feel that constructed response items are more authentic and
hence, in some sense, more useful than multiple-choice items (p. 245)."
There is a second article which in some ways is even better:
Wainer, H. & Thiessen, D. (1993) Combining multiple-choice and
constructed-response test scores: Toward a Marxist theory of test
construction. Applied Measurement in Education 6(2), 103-118.
In this article the authors point out the incredibly high cost of
constructed response vs. multiple-choice tests. They also describe the
difficulty (impossibility) of rationally combining the results of the two
item formats. Here are a couple quotes:
"For all the tests listed in Table 1, whatever is being measured by the
constructed-response section is measured better by the multiple-choice
section. These seven tests were but a sample. We have never found any
test
that is composed of an objectively and a subjectively scored section for
which this not true. We invite readers to produce counter evidence. "
"A natural conclusion to reach from the weightings associated with
constructed-response versus multiple-choice questions is that the former
take more examinee time and resources to measure essentially the same thing
more pooly than the latter."
George K. Cunningham
University of Louisville