CRAWFORD PREFERS STANDARDS/CRITERION REFERENCED TESTS
\doc\web\98\06\critnorm.txt
to: dggarner@swbell.net
the big problem with "Criterion referenced" testing isn't the goal
that all should pass, but that the NAEP and all the other goof ball
"perfomance" and "standardsbased" tests set the level so high that
everybody fails  70% are "below grade level" even though wherever
the middle 50% scores is in fact a reasonable expectation of what
real kids can and really do. Granted, there should be some way to
compare scores across years, and that's another scam of the
performance based scores.
For example, the entire WA state 4th grade test scores made headlines
that they appears to have increased from 20% to 30% above "standard",
but when every school jumped up by the same amount, a more reasonable
explanation is that the test just got easier, not that every school
successfully figured out how to teach to a test that openly tests for
content that isn't covered on the test's own stated benchmarks until
grade 7 or grade 10. If you can't trust these guys to produce a test
with 4th grade skills, how can you trust them to produce a 1998 test
that is the same level of difficulty as the 1997 test? KIRIS did the
same thing, deliberately score schools low the first year ,then
gradually dumb down the test to raise the pass rate.
Date sent: Wed, 9 Sep 1998 08:27:41 0400
To: professor@tricon.net
From: professor@tricon.net
Subject: ClearingHouse: 'THE LATEST WORD'
The following was just posted to the ClearingHouse:
Posted by:
Name: dggarner@swbell.net
Email: dggarner@swbell.net
Subject: A Solution to State Testing
Time & date added: 19980909 08:27
Message:
A Solution to State Testing
by Donna Garner
September 9, 1998
I have been weighing in my own mind what the solution to state testing
should be. I contacted Elaine McEwan to help me with some definitions
which she so kindly did. I am not an expert on testing; but as a teacher,
I have participated in state testing in Texas since the early 1980\'s.
What follows is a threepart commentary. The first part is Elaine
McEwan\'s definitions. The second part is my recommendation. The third
part is Bruce Crawford\'s article on standardsbased testing.
I hope this information will be helpful.
Donna Garner
dggarner@swbell.net

  Part I  Elaine\'s comments:
A criterionreferenced test is a test with questions based on what the
student was taught. It is designed to measure how much specific knowledge
the student has learned from that instruction. Typically a classroom
teacher would give a criterionreferenced test based on the
concepts/skills/information that were taught/covered by the teacher during
a period of time.
A normreferenced test relates the scores of each student to those of
students in a control or norm group. This test shows how each student and
group of students rank compared with an average. The intention is that
when schools and school districts give the same tests under the same
conditions and are ranked according to the same norm, their scores will be
comparable.
On a normreferenced test, the \"norm\" by definition is the midpoint of
the performance of students in the norm group: 50 percent of them score
above the norm and 50 percent below.
Theoretically, everyone who takes a criterionreferenced test could get a
top grade. On a normreferenced test, only those who get the most answers
correct will score in the 99th percentile. When the test is constructed,
there are a certain number of questions used that only a very few students
will be able to get right and a certain number that the majority will get
right. ================================================= Part II  My
recommendations:
>Based upon Elaine\'s definitions, I believe the Texas Assessment of
Academic Skills (TAAS) is a
>normreferenced test. Another name for it is \"grading on the curve\"
>50% above the middle and 50% below the middle.
>
>On a normreferenced test, even if every child does worse, there are
>still going to be 50% above the middle and 50% below the middle; but
>middle can
drop lower each year. That is the reason that the Texas Education Agency
(TEA) must get >all the TAAS scores into their office before they can give
out any final >data; they have to determine what the \"middle\" point will
be \"this year.\" >When I have called to find out just how many questions
a student can miss >on the multiplechoice grammar/punctuation/spelling
section before he fails >that section, the TEA has told me that there is
not a set number. The >number changes from year to year. > >If I tested
the same way in my classroom, it would be called \"grading on >the
curve.\" I would grade all the papers and then figure out which ones
>would be considered above the 50% mark and which ones would be considered
>below the 50% mark. Those above would pass; those below would fail. Many
>college professors use this method. > >Of course, most secondary
teachers know not to grade that way. What we >should do in the classroom
and on the TAAS is to give a standardsbased >test that is
criterionreferenced to the explicit skills of the TAD. This >would mean
that if there were 20 questions on the test, each one would be >worth four
points whether the paper is graded this year, next year, or ten >years
from now. > >A criterionreferenced test would test the specific things
that have been >taught. The term \"criterion referenced\" does not
indicate how much each >question would be worth or whether there is a
midline average or not. The >term only deals with the fact that what is
tested is what has been taught.
>Nationally normed tests (e.g., Stanford 9, Iowa Test of Basic Skills) are
>normreferenced tests, but a huge number of children from all over the
>country have taken the tests. The testmakers figure out where the
quartiles/percentiles fall, and children\'s scores are compared to those
quartiles/percentiles.
>If Texas were to go to a nationally normed test, the results would mean
>more because the number of students tested would be so much larger than
the
>TAAS. We could compare ourselves nationally at each grade level. The
>problem with the NAEP is that it is not given at each grade level, and it
>is not given every year.
>
>My suggestion is that we give a nationally normed test at various grade
>levels. We should also give a criterionreferenced test at each grade
>level which is standardsbased and is correlated to an explicit set of
>gradespecific standards. If we had all that, we would have real
>accountability. ================================================ Part
>III  Bruce Crawford\'s article:

>July has arrived, and so have the muchanticipated Stanford9 test
>results. Having spent $35 million testing 4.1 million students, do we
>know anything new? Not really.
>
>How can that be? The first hint comes from contradictions generated by
>the results themselves.
>
>For instance, Gov. Pete Wilson called the scores \"deplorable,\" and
>Supt.
>
>of Public Instruction Delaine Eastin said they\'re \"good news.\"
>
>Contradictions surround the language arts results. Several districts sued
>to block release of scores for limited English students. While those
>districts were in court claiming unfairness, other districts boasted
>their English as second language students outperformed their Englishonly
>ones.
>
>A third contradiction comes by way of my two kids. As students, they are
>
>complete opposites. The older one does as little as possible. The
>younger one gets straight As and takes mostly AP courses. Yet, with the
>exception of math, their scores were fairly even.
>
>While contradictions may cause us to raise our eyebrows, they don\'t
>explain why we don\'t know more.
>
>The chief reason is that the Stanford9 is not tuned to California
>educational standards. Therefore it can\'t measure how well our students
>have learned what we expect them to know.
>
>It\'s not meant to. It is what\'s known as a normreferenced test. This
>type of test compares one group of students to other students. Here\'s
>how a normreferenced test works.
>
>Let\'s say we have 100 students climbing a 10,000ft. high mountain. Of
>these, 90 are spread out between the base and 6,500 ft. The other 10 are
>
>spread out above 6,500 ft. This observer is at 7,500 ft., trailing just 4
>other climbers.
>
>On a normreferenced basis, a climber at 6,500 ft. would be in the 90th
>percentile because 90% of the climbers are at his/her level or below.
>Yours truly would be in the 95th percentile.
>
>Now let\'s use the same mountain scene to represent the other major type
>of test, known as standards based. The peak symbolizes mastery of the
>subject matter.
>
>This type of test measures students to in terms of how close they are to
>the peak. It doesn\'t matter to us where the others are. Our only
>interest is where we are.
>
>Relative to my classmates, I am among the leaders. However I only know
>75% of what I am expected to have learned. Without moving an inch, I just
>tumbled to \"C\" level performance. The large group clustered around
>5,000 ft. dropped from \"average\" to failing.
>
>The Stanford9 tells us nothing about whether we\'re near the peak, or
>down in the foothills. (Other evidence leans to the latter.)
>
>Does this mean that the Stanford9 exercise has been a total waste of
>time and money? No.
>
>While it shouldn\'t have cost us $35M to do so, the test format did set
>an important precedent. It established that the public has a right to
>easy access to detailed information about our schools.
>
>
>What should we do with this test and its precedent? First, complete the
>new standards currently in progress. The state Board of Education has
>already adopted strong new math and language arts standards.
>
>The new math standards are so strong, the Hudson Institute gave them a
>\"Perfect\" score  besting Japan\'s mere \"A\". Domestically, a
>whopping third of all states flunked outright.
>
>The draft version of the new science standards looks equally strong. It
>is premature to make a call on the history standards.
>
>Armed with robust new standards, we should switch to a standardsbased
>test tuned specifically to them. The test should be integrated into a
>system modeled after the Tennessee ValueAdded Assessment System (TVAAS).
>
>The TVAAS, developed by Dr. Sanders of the University of Tennessee, has
>been producing data long enough that they are now finding remarkable
>trends. For instance, students who have had a terrific teacher tend to do
>better for three years afterwards. Conversely, some students who have
>
>had a lousy teacher never recover.
>
>With Sanders\' system, accountability quickly becomes a meaningful topic.
>
>With accountability, exciting possibilities arise everywhere.
>
>Administrators would have data upon which to base management decisions.
>Parents could request particular teachers, or refuse to let their
>children be subjected to others. Teachers would have a foundation for
>meritbased pay.
>
>In closing, if it\'s true that knowledge is power, then the knowledge of
>which districts, schools and teachers are getting the job done will make
>
>the entire public education system much more accountable and responsive
>to, well, the public.
>
>The real value of this year\'s statewide test may have been its role in
>moving toward greater accountability and responsiveness.
>
>=============================================
>Redirected by: Jimmy Kilpatrick http://www.readbygrade3.com
>=============================================
>
Elaine K. McEwan
The McEwanAdkins Group
Office: (520) 5444088
Fax: (520) 5448764
Educational Resources for Busy Parents and Educators
http://www.elainemcewan.com
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Subscribers may view this post and any others to which it is related by
visiting: http://www.educationconsumers.com/ClearingHouse/
Subscribers may choose to discontinue messages forwarded from the
ClearingHouse by clicking the "Receive new postings by email" link and
making the appropriate selections.
EDUCATION CONSUMERS CLEARINGHOUSE
http://www.educationconsumers.com