TIME Ideas obtained this Pearson memo from an employee in New York state government who is frustrated with the lack of transparency surrounding the recent firestorm over standardized testing. The full text is below, with the exception of a phone number in the last paragraph that has been redacted. To read Andrew J. Rotherham’s anatomy of the scandal, click here.
April 22, 2012
Mr. Ken Slentz
Deputy Commissioner, Office of P-12 Education
New York State Education Department
89 Washington Avenue
Albany, New York 12234
Pearson is confident that the NYS Grades 3-8 English Language Arts (ELA) and Mathematics assessments have been developed to support valid and reliable interpretations of scores for their intended uses. The “Hare and the Pineapple” passage and associated items were placed on the Grade 8 ELA test after the NYSfield test data associated with the multiple choice items and the feedback from the “final eyes” committee determined that this was an appropriate passage and set of items to include on the test. Detailed background information about the passage and items are provided below.
Background on SAT 10 Items and Use in New York State
When the contract was awarded to Pearson in March 2011, part of the scope of work was to include norm-referenced items that would be administered each spring in the New York State
Grades 3-8 English Language Arts (ELA) and Mathematics assessments. These items would serve two purposes – to provide national normative data and to contribute to the student’s operational score. Form B of SAT 10 was planned to be used intact to meet both requirements. Likewise, due to the planned inclusion of these normed items, Pearson planned to meet the item development target numbers with a combination of both normed and custom developed items.
In fall of 2011, the New York State Education Department (NYSED)made a determination that the SAT 10 Form B would not be used in total on the 2012 operational assessment. This decision was made due to the fact that not all SAT 10 items are aligned to 2005 New York State standards and having such items contribute to an operational score was not ideal. With this decision, two shifts resulted. The first is that if any SAT 10 Form B items were used on the operational assessment, they would not yield normative data (as the complete SAT 10 Form B is needed to establish this). Secondly, it was determined that custom developed passages and items should be placed on the operational test forms first, and if there weren’t enough eligible custom items, to use the field tested SAT 10 Form B items.
Why the “Hare and Pineapple” Passage was Chosen
During test construction it was determined that with the exclusion of the SAT 10 items on the operational form there were not enough custom items developed to assess Strand 2, therefore
“The Hare and the Pineapple” passage and associated items were chosen for the operational form. This was a sound decision in that “The Hare and the Pineapple” and associated items had been field tested in New York State, yielded appropriate statistics for inclusion, and it was aligned to the appropriate NYS Standard.
“The Hare and the Pineapple” passage is intended to measure NYS Standard “interpretation of character traits, motivations, and behavior” and “eliciting supporting detail”. The associated six multiple choice items are aligned to the NYS Reading Standards, specifically to Strand 2. The NYS performance indicator assigned to the items is “Interpret characters, plot, setting, theme, and dialogue, using evidence from the text”.
It is important to note that the use of SAT 10 items as operational items will not occur going forward as Pearson is developing an adequate number of custom items aligned to the Common Core Standards.
Concerns with Items Associated with “Hare and Pineapple”
There have been two items of the set of six that have been challenged by NY teachers and students as the test was under way April 17-19, 2012 -Item 7 and Item 8. The correct answers and rationales to Item 7 and Item 8 are as follows:
• Item 7: The correct answer is C. The question regarding the animals’ possible motivation for eating the pineapple requires a reader to infer the correct answer from clues conveyed in the text. While all of the options are plausible motivations, the most likely answer is that the animals were annoyed. Paragraph 13 indicates that the animals support the pineapple to win the race because they assume the pineapple has a clever plan. However, the pineapple never moves during the race. From these clues and events, a reader can infer that the animals are annoyed. The text does not support the inference that the animals are motivated by hunger, excitement, or amusement.
• Item 8: The correct answer is D. The question regarding the wisest animal requires the reader to apply close analytic reading skills to determine which of the choices represents the wisest animal based on clues given in the text. The moose and the crow are the two animals that present the incorrect idea that the pineapple has a clever plan to win the race. This idea is proven false when the hare wins the race. The hare is presented as incredulous that a pineapple would challenge him to a race, but overconfidently agrees to race a pineapple.
Finally, the owl declares that “Pineapples don’t have sleeves,” which is a factually accurate statement. This statement is also presented as the moral of the story, allowing a careful reader to infer that the owl is the wisest animal.
Previous Use of “Hare and Pineapple” Passage and Items
The Stanford 10 Form B, which contains the passage and the six multiple choice items, is used exclusively as a secure form. This means that this form is available only for state-wide or large district customers who agree to maintain security of the documents at all times. Between 2004 and 2012 the form was previously used in six other states and three large districts. In 2012, the only state-wide use of this form was in NY State. Until the events of this past week, we did not have any prior knowledge that the passage entitled “The Hare and the Pineapple” had any controversy associated with it from any prior use.
State administrations include:
• Alabama 2004-2011
• Arkansas 2008-2010
• Delaware 2005-2010
• Illinois 2006-2007
• New Mexico 2005-2007
• Florida 2006
Large District Administrations:
• Chicago 2006-2007
• Fort Worth
Item statistics are provided for the six items related to the Hare and the Pineapple, both based on New York state field test in 2011, and a representative sample at the national level (2002). As can be observed from the statistics on the following page, the items performed reasonably well. Based on the New York State students’ performance, item p values range from 0.32 to 0.86, indicating a good selection of easy and challenging items related the passage. The discrimination powers (based on point biserial values) of the items are also high, ranging from 0.27 to 0.47. The industry standard requires point biserial values to be higher than 0.20.
The National Research Program for the standardization of Stanford 10 took place during the spring and fall of 2002. The purpose of the National Research Programs were to provide the data used to equate the levels and forms of the test series, establish the statistical reliability and validity of the tests, and develop normative information descriptive of achievement in schools nationwide. Testing for the Spring Standardization Program of all levels and forms of Stanford 10 took place from April 1, 2002, to April 26, 2002. Testing for the Equating of Levels Program, Equating of Forms Program, and Equating of Stanford 10 to Stanford 9 took place from April 1, 2002, to May 24, 2002. Approximately 250,000 students from 650 school districts across the nation participated in the Spring Standardization Program, with another 85,000 students from 385 school districts participating in the spring equating programs. Some students participated in more than one program
Testing for the Fall Standardization Program took place from September 9, 2002, to October 18, 2002. Testing for the Equating of Levels Program, Equating of Forms Program, and Equating of Stanford 10 to Stanford 9 took place from September 9, 2002, to November 1, 2002. Approximately 110,000 students participated in the Fall Standardization and Equating Programs. Some students participated in more than one program.
The majority of individuals who wrote test items for Stanford 10 were practicing teachers from across the country with extensive experience in various content areas. Test item writers were thoroughly trained on the principles of test item development and review procedures. They received detailed specifications for the content area for which they were writing, as well as lists of instructional standards and examples of both properly and improperly constructed test items.
As test items were written, and received, each test item was submitted to rigorous internal screening processes that included examinations by:
• content experts, who reviewed each test item for alignment to specified instructional standards, cognitive levels, and processes;
• measurement experts, who reviewed each test item for adequate measurement properties;
• editorial specialists, who screened each test item for grammatical and typographical errors.
The items were then administered in a National Item Tryout Program which provided information about the pool of items from which the final forms of the test were constructed. The information provided by the Stanford 10 National Item Tryout Program included:
• The appropriateness of the item format: How well does the item measure the particular instructional standard for which it was written?
• The difficulty of the question: How many students in the tryout group responded correctly to the item?
• The sensitivity of the item: How well does the item discriminate between students who score high on the test and those who score low? .
• The grade-to-grade progression in difficulty: For items trie~ out in different grades, did more students answer the question correctly at successively higher grades?
• The functioning of the item options: How many students selected each option?
• The suitability of test length: Are the number of items per subtest and recommended administration times satisfactory?
In addition to statistical information about individual items, information was collected from teachers and students concerning the appropriateness of the questions, the clarity of the directions, quality of the artwork, and other relevant information.
We trust this information is helpful to you. Please know that Pearson is ready to assist you and answer any additional questions you may have. As such, don’t hesitate to contact me at
Jon S. Twing, Ph.D.
Executive Vice President & Chief Measurement Officer