STANDARDS SETTING PROCEDURE DOES _NOT_ DIRECTLY COMPARE AGAINST
BENCHMARKS! Paul Englesberg notes
that the technical manual does not mention judges comparing questions
against any benchmarks, only setting a difficulty level of problems
already "established" to be appropriate to the grade level.
It is my understanding from Jeff Estes who was on one of these
committes that they never questioned the inclusion of any of the
problems, they only put a "bookmark" where they thought the pass
point should be. Thus if a question appears with independent
probability or indirect measurement or proportionality, the
"standards setting committee" had nothing to do with the inclusion of
the problems in the first place. The technical document does not
state who put these questions in the first place, or who did the
checking. This is probably the reason why we have a 4th grade test
that is full of EALRs from the 7th and 10th grade levels in it, even
though it is claimed the standards setting committee has checked it
against the benchmarks.
The example test does not tell where this pass point is, but George
Cunningham of U Loisville says that such methods are highly
unreliable. The WA test is derived from work in the New Standards
project, and this process is documented exactly in new articles from
any number of other states, from what I've seen of New Standards and
other state's example problems, they also suffer from the same
problem of not corresponding to any written grade-level standards
(frequency histograms, complex woodworking, ratio, probabilty, etc)
The whole ed reform is based on "outcome based education" a system so
thoroughly discredited that although "performance-based" education
was simply defined as OBE in initial 1209 handouts, OBE no longer
appears anywhere in CSL or OSPI literature. OBE is what is
fundamentally flawed. All on the list need to do research on the
foundations and assumptions of OBE and why it is now so discredited
that no one call it by its original name or Mastery Based Education
was it was known before then. "Standards - Based " Education is only
the newest name for OBE.
Even if you fix the process, we still have the problem that
constructed response problems based on skills which are not taught
have a disproportionate impact on minorities, it still costs 10 times
as much to score, it has a measurable level of inherent scoring
unreliablity compared to multiple-choice questions where a computer
can score 100% correctly, and no one can argue as to what constitutes
a correct answer (remember both examples published by OSPI for math
had INCORRECT answers, and both were non-compliant with published
benchmarks)
The fact is that we've simply cobbled together a slightly modified
version of what the NCEE's Marc Tucker has been peddling to other
states with almost no evidence of positive results, and many examples
of huge disasters (CA CLAS and KY KIRIS stand out, WASL will be
remembered as the next big disaster)
Date sent: Tue, 15 Sep 1998 16:49:23 -0700
To: arthurhu@halcyon.com, owner-wa-math-sci@mickey.esd113.wednet.edu,
wa-legislation@inspire.ospi.wednet.edu,
wa-esslrngs@whitecap.psesd.wednet.edu
From: Paul Englesberg
Subject: Re: Criticism of test vs benchmarks
> In further response to points raised by Arthur Hu, I'd suggest that people
> interested in the asessment vs. benchmarks issue refer to the CSL documents
> on Standard Setting Procedure, "Standard Setting Procedure for Grade 4:
> Listening, Reading, Writing and Mathematics" at:
> http://csl.wednet.edu/Web%20page/3%20Assessment%20System/subdocuments/Techni
> cal%20Manual/A-InformManual.html
>
> Here's a brief excerpt:
> "The purpose of the standard-setting process then was to establish the level of
> performance expected of Grade 4 students who are judged as meeting the
> standard in
> listening, reading, writing, and mathematics. The emphasis for the judges in
> the standard setting process was on what students should know and be able to
> do near the end of Grade 4...."
> "Next, in small groups, the judges examined the items in the ordered booklet
> one at a time, starting with the first (easiest) item in the booklet, and
> moving to the second easiest item, and so on, until all items (and their
> scoring rubrics) were examined. As judges examined each item, they were
> asked to consider:
>
> What is each item measuring?
> What makes each item more difficult than the items that precede it?
>
> Judges proceeded through the ordered item booklets and specially trained
> table leaders
> encouraged them to observe the increase in the complexity of the items and
> note the increase in knowledge, skills, and abilities required to answer the
> items...."
>
> This document discusses the setting of "bookmarks" for the standards by
> judges, but doesn't seem to directly discuss reference to the 4th grade
> benchmarks. I would certainly expect that the standard-setters would be
> constantly referring to the EALR benchmarks. Can anyone elaborate?
> While Arthur Hu seems to assume that no one is paying attention to
> standards in a rational way and the whole system/reform effort is
> wrong-headed, it seems to me that existing discrepancies come from some
> weaknesses in the system that need to be, and can be, addressed. Surely a
> new approach with a new assessment system will have some problems - both
> major oversights and minor glitches. But opposing the "whole ed reform"
> becuase it rests on "goofed up" concepts leads to Mr. Hu to use these
> weaknesses to condemn everything that is being done rather than strive for
> improvements. I'd like to hear what others have to say about these issues.
>
>
> At 03:16 PM 9/15/98 -0700, Arthur Hu wrote:
> >Indeed, you are correct that I have problem on two levels, one, the test
> >is not consistent with its own benchmarks as to what is to be
> >taught at which grade, and two, the whole ed reform "all can
> >succeed" assumption is goofed up from the very concept.
>
> Paul Englesberg
> Woodring College of Education
> Western Washington University
> Bellingham, WA 98225
> ph: (360) 650-7527
> email: pengle@wce.wwu.edu
>