CRAWFORD PREFERS STANDARDS/CRITERION REFERENCED TESTS \doc\web\98\06\critnorm.txt to: dggarner@swbell.net the big problem with "Criterion referenced" testing isn't the goal that all should pass, but that the NAEP and all the other goof ball "perfomance" and "standards-based" tests set the level so high that everybody fails - 70% are "below grade level" even though wherever the middle 50% scores is in fact a reasonable expectation of what real kids can and really do. Granted, there should be some way to compare scores across years, and that's another scam of the performance based scores. For example, the entire WA state 4th grade test scores made headlines that they appears to have increased from 20% to 30% above "standard", but when every school jumped up by the same amount, a more reasonable explanation is that the test just got easier, not that every school successfully figured out how to teach to a test that openly tests for content that isn't covered on the test's own stated benchmarks until grade 7 or grade 10. If you can't trust these guys to produce a test with 4th grade skills, how can you trust them to produce a 1998 test that is the same level of difficulty as the 1997 test? KIRIS did the same thing, deliberately score schools low the first year ,then gradually dumb down the test to raise the pass rate. Date sent: Wed, 9 Sep 1998 08:27:41 -0400 To: professor@tricon.net From: professor@tricon.net Subject: ClearingHouse: 'THE LATEST WORD' The following was just posted to the ClearingHouse: Posted by: Name: dggarner@swbell.net Email: dggarner@swbell.net Subject: A Solution to State Testing Time & date added: 1998-09-09 08:27 Message: A Solution to State Testing by Donna Garner September 9, 1998 I have been weighing in my own mind what the solution to state testing should be. I contacted Elaine McEwan to help me with some definitions which she so kindly did. I am not an expert on testing; but as a teacher, I have participated in state testing in Texas since the early 1980\'s. What follows is a three-part commentary. The first part is Elaine McEwan\'s definitions. The second part is my recommendation. The third part is Bruce Crawford\'s article on standards-based testing. I hope this information will be helpful. Donna Garner dggarner@swbell.net -------------------------------------------------------------------------- -- ------------------------ Part I -- Elaine\'s comments: A criterion-referenced test is a test with questions based on what the student was taught. It is designed to measure how much specific knowledge the student has learned from that instruction. Typically a classroom teacher would give a criterion-referenced test based on the concepts/skills/information that were taught/covered by the teacher during a period of time. A norm-referenced test relates the scores of each student to those of students in a control or norm group. This test shows how each student and group of students rank compared with an average. The intention is that when schools and school districts give the same tests under the same conditions and are ranked according to the same norm, their scores will be comparable. On a norm-referenced test, the \"norm\" by definition is the midpoint of the performance of students in the norm group: 50 percent of them score above the norm and 50 percent below. Theoretically, everyone who takes a criterion-referenced test could get a top grade. On a norm-referenced test, only those who get the most answers correct will score in the 99th percentile. When the test is constructed, there are a certain number of questions used that only a very few students will be able to get right and a certain number that the majority will get right. ================================================= Part II -- My recommendations: >Based upon Elaine\'s definitions, I believe the Texas Assessment of Academic Skills (TAAS) is a >norm-referenced test. Another name for it is \"grading on the curve\"-- >50% above the middle and 50% below the middle. > >On a norm-referenced test, even if every child does worse, there are >still going to be 50% above the middle and 50% below the middle; but >middle can drop lower each year. That is the reason that the Texas Education Agency (TEA) must get >all the TAAS scores into their office before they can give out any final >data; they have to determine what the \"middle\" point will be \"this year.\" >When I have called to find out just how many questions a student can miss >on the multiple-choice grammar/punctuation/spelling section before he fails >that section, the TEA has told me that there is not a set number. The >number changes from year to year. > >If I tested the same way in my classroom, it would be called \"grading on >the curve.\" I would grade all the papers and then figure out which ones >would be considered above the 50% mark and which ones would be considered >below the 50% mark. Those above would pass; those below would fail. Many >college professors use this method. > >Of course, most secondary teachers know not to grade that way. What we >should do in the classroom and on the TAAS is to give a standards-based >test that is criterion-referenced to the explicit skills of the TAD. This >would mean that if there were 20 questions on the test, each one would be >worth four points whether the paper is graded this year, next year, or ten >years from now. > >A criterion-referenced test would test the specific things that have been >taught. The term \"criterion referenced\" does not indicate how much each >question would be worth or whether there is a mid-line average or not. The >term only deals with the fact that what is tested is what has been taught. >Nationally normed tests (e.g., Stanford 9, Iowa Test of Basic Skills) are >norm-referenced tests, but a huge number of children from all over the >country have taken the tests. The testmakers figure out where the quartiles/percentiles fall, and children\'s scores are compared to those quartiles/percentiles. >If Texas were to go to a nationally normed test, the results would mean >more because the number of students tested would be so much larger than the >TAAS. We could compare ourselves nationally at each grade level. The >problem with the NAEP is that it is not given at each grade level, and it >is not given every year. > >My suggestion is that we give a nationally normed test at various grade >levels. We should also give a criterion-referenced test at each grade >level which is standards-based and is correlated to an explicit set of >grade-specific standards. If we had all that, we would have real >accountability. ================================================ Part >III -- Bruce Crawford\'s article: - >July has arrived, and so have the much-anticipated Stanford-9 test >results. Having spent $35 million testing 4.1 million students, do we >know anything new? Not really. > >How can that be? The first hint comes from contradictions generated by >the results themselves. > >For instance, Gov. Pete Wilson called the scores \"deplorable,\" and >Supt. > >of Public Instruction Delaine Eastin said they\'re \"good news.\" > >Contradictions surround the language arts results. Several districts sued >to block release of scores for limited English students. While those >districts were in court claiming unfairness, other districts boasted >their English as second language students outperformed their English-only >ones. > >A third contradiction comes by way of my two kids. As students, they are > >complete opposites. The older one does as little as possible. The >younger one gets straight As and takes mostly AP courses. Yet, with the >exception of math, their scores were fairly even. > >While contradictions may cause us to raise our eyebrows, they don\'t >explain why we don\'t know more. > >The chief reason is that the Stanford-9 is not tuned to California >educational standards. Therefore it can\'t measure how well our students >have learned what we expect them to know. > >It\'s not meant to. It is what\'s known as a norm-referenced test. This >type of test compares one group of students to other students. Here\'s >how a norm-referenced test works. > >Let\'s say we have 100 students climbing a 10,000-ft. high mountain. Of >these, 90 are spread out between the base and 6,500 ft. The other 10 are > >spread out above 6,500 ft. This observer is at 7,500 ft., trailing just 4 >other climbers. > >On a norm-referenced basis, a climber at 6,500 ft. would be in the 90th >percentile because 90% of the climbers are at his/her level or below. >Yours truly would be in the 95th percentile. > >Now let\'s use the same mountain scene to represent the other major type >of test, known as standards- based. The peak symbolizes mastery of the >subject matter. > >This type of test measures students to in terms of how close they are to >the peak. It doesn\'t matter to us where the others are. Our only >interest is where we are. > >Relative to my classmates, I am among the leaders. However I only know >75% of what I am expected to have learned. Without moving an inch, I just >tumbled to \"C\" level performance. The large group clustered around >5,000 ft. dropped from \"average\" to failing. > >The Stanford-9 tells us nothing about whether we\'re near the peak, or >down in the foothills. (Other evidence leans to the latter.) > >Does this mean that the Stanford-9 exercise has been a total waste of >time and money? No. > >While it shouldn\'t have cost us $35M to do so, the test format did set >an important precedent. It established that the public has a right to >easy access to detailed information about our schools. > > >What should we do with this test and its precedent? First, complete the >new standards currently in progress. The state Board of Education has >already adopted strong new math and language arts standards. > >The new math standards are so strong, the Hudson Institute gave them a >\"Perfect\" score -- besting Japan\'s mere \"A\". Domestically, a >whopping third of all states flunked outright. > >The draft version of the new science standards looks equally strong. It >is premature to make a call on the history standards. > >Armed with robust new standards, we should switch to a standards-based >test tuned specifically to them. The test should be integrated into a >system modeled after the Tennessee Value-Added Assessment System (TVAAS). > >The TVAAS, developed by Dr. Sanders of the University of Tennessee, has >been producing data long enough that they are now finding remarkable >trends. For instance, students who have had a terrific teacher tend to do >better for three years afterwards. Conversely, some students who have > >had a lousy teacher never recover. > >With Sanders\' system, accountability quickly becomes a meaningful topic. > >With accountability, exciting possibilities arise everywhere. > >Administrators would have data upon which to base management decisions. >Parents could request particular teachers, or refuse to let their >children be subjected to others. Teachers would have a foundation for >merit-based pay. > >In closing, if it\'s true that knowledge is power, then the knowledge of >which districts, schools and teachers are getting the job done will make > >the entire public education system much more accountable and responsive >to, well, the public. > >The real value of this year\'s statewide test may have been its role in >moving toward greater accountability and responsiveness. > >============================================= >Redirected by: Jimmy Kilpatrick http://www.readbygrade3.com >============================================= > Elaine K. McEwan The McEwan-Adkins Group Office: (520) 544-4088 Fax: (520) 544-8764 Educational Resources for Busy Parents and Educators http://www.elainemcewan.com ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Subscribers may view this post and any others to which it is related by visiting: http://www.education-consumers.com/ClearingHouse/ Subscribers may choose to discontinue messages forwarded from the ClearingHouse by clicking the "Receive new postings by e-mail" link and making the appropriate selections. EDUCATION CONSUMERS CLEARINGHOUSE http://www.education-consumers.com