Brian and Deborah, This note is in response to messages from Brian and Deborah, which I include at the end of this post. I believe there are two separate issues about the format of assessments. The first is about whether high-stakes, large-scale assessments should be used at all. The members of this list and Brian specifically are opposed to these practices. I believe that he is is asserting that the discussions of format are irrelevant because we just should not be doing it. This is a credible position to take. On the other hand, 49 states are conducting such assessments. In today's Washington Post there is an editiorial strongly urging the staying of the course on SBER. If these policies are strongly supported by the liberal press, all of the governors, most of the legislatures, the two presidential candidates and the business community, their elimination seems rather unlikely at this point. It is my belief that if we are going to be conducting large-scale, high-stakes tests, they should reflect the best psychometric methods. The big issue is whether these tests should be authentic as the term is usually used to refer to portfolios, performance assessments and the constructed response format (of doubtful authenticity) or objective item format. Let me make this point clear. I am not talking about dictating the assessment practices of teachers nor am I trying to compare the instructional advantages of objective versus authentic tests in the classroom. I am restricting my discussion to state accountability assessments. Deborah argues against the use of multiple-choice tests citing some good reasons for her aversion to them. The problem is that decision about which to use cannot be made merely by looking at the problems with the multiple-choice item format. It is necessary to do a cost/benefit analysis of both authentic and objective methods. In so many of these discussions what we get is condemnations of the M-C item format under the assumption that if it can be shown to be flawed then alternative methods must be better. In Lorrie Shepherd's classic defense of performance assessment in her article (1996) "Effects of introducing classroom performance assessments on student learning" in Educational Measurement: Issues and Practice, 15(3), 7 she says "These anticipated benefits of performance assessments have been inferred by analogy from research documenting the negative effects of traditional, standardized testing." She has it exactly wrong. Her problem is that evidence for the benefits of performance assessments do not exist. If someone asks you if a Ford is good car, they cannot prove it by denigrating Chevrolets. When the effectiveness of objective tests are compared empirically with alternative methods or large-scale, high-stakes tests, the objective format wins hands down. According to the studies of the advanced placement tests conducted by Wainer and Thiessen, the ratio of costs between multiple-choice and constructed response items is about 3000 to 1. Constructed response items are far less reliable as well. They conclude with the following assertion "On the basis of the data examined, we are forced to conclude that constructed-response items provide less information in more time at greater cost than do multiple-choice items. This conclusion is surely discouraging to those who feel that constructed response items are more authentic and hence, in some sense, more useful than multiple-choice items. It should be (p. 245)." The last argument for constructed-response items is that students benefit instructionally from their use. Anyone who believes this owes it to themselves to review the issue of the Phi Delta Kappan from a year ago that seems to have put the lie to that viewpoint. Here are Brian and Deborah's messages "George: It seems to me that one of the reasons you do authentic assessment is that people can't, should not, be standardized. I could care less how a student in my class in Richland Center, WI compares with a student in Los Angeles, CA. Anyone who does has problems IMHO. To criticize performance assessments on the basis that there are problems with standardizing them is to miss the forest for the trees. Get rid of standardization and get on with the business of promoting a democratic society. I am not here, in the classroom, to "sell" the current state of affairs; not here to say, this is the way it is, fit in; not here to advocate on behalf of the ruling class, promoting false competition in a merit based society (which doesn't prevail, anyway, in many sectors.) I am here to expand their horizons. I am here to get students to think about how the system is, and to challenge it and to change it. Standardization is about control and has nothing to do with a democratic classroom or democratic society. The virtue of authentic assessments is that they can't be standardized." and Deborah: George Cunningham, > > I don't expect to convince you off hand. But please at least read this and > think about it. It's the toughest issue I've ever faced--trying to share > with others the simple fact that what appears straightforward is not. > > What's odd is that while the "bad items" are easier to explain--as you > suggest, the "good items" are very much like the bad ones. They are items > I had no trouble with when I took such tests, but found the kids (including > one of my own three) answered wrong. I didn't get it. The reasons at > first baffled me. So I did the odd thing: I asked them to tell me why they > gave the answers they did. And then they almost persuaded me that it was > I who was wrong!. > > Precisely in order to produce the kind of curve needed the choices can't be > between one clearly right and three clearly wrong ones. That's the rub. > The misleading answers vary - but they aren't simply tricks or simply wrong > either. They do indeed "trick" kids because they are answers that make > best sense to them for reasons that have nothing to do with what we > normally mean by good or bad reading comprehension. Many such kids read > faultlessly and can retell accurately. The same holds true for the > vocabulary section of many reading tests (not to mention the word attack > skills stuff). What the wrong answers rest on are a different set of > experiences that lead to different assumptions, different guesses about > what's at stake and what's happening and who's who, as well as different > interpretations of directions, etc. The kids I interviewed, and tape > recorded nearly 30 years ago (and I've repeated this many times since) did > no better when they read the test for themselves then when I read it aloud > for them. I was so dumbfounded I had to do this over and over. The same, > of course, holds true of many other subject matter tests that rest on > reading interpretations. Dmessages is George K. Cunningham University of Louisville -------------------------------------------------------------------------- To unsubscribe from the ARN-L list, send command SIGNOFF ARN-L to LISTSERV@LISTS.CUA.EDU.