||by Diann Resnick
Bellaire High School
Student Performance by Question
Suggestions for Teachers
This year, as in past years, the 2002 AP Statistics Exam Reading was quite an educational experience. The best part of the Reading was seeing how well written some of the student papers were. These papers clearly showed that good teaching and learning was happening in the classroom. With the increase in exams (about 50,000 exams this year) there were wide variations in responses and level of preparedness of the students. Listed below are general comments for the exam in general and then comments about each free-response question.
Student Performance by Question
- seemed to have difficulty recognizing the difference between a population and a sample.
- were very sloppy with statistical notation and definitions. It was not uncommon to see students use creative notation (to represent a population proportion) and use the word "mean" to represent proportions or p/n to represent a mean.
- anticipated test questions and often answered not the question asked, but what they thought the question was or wanted it to be. Many students did not read the questions carefully.
- wrote more than was necessary to answer a question. It often appeared that some students were not sure of their answer so they added extraneous material. In doing so, they often wrote incorrect statements and were either penalized for the extraneous incorrect statements, or the statements were considered parallel solutions. In the case of parallel solutions, the worst of the two answers is graded and many students lost credit for a problem.
- did not proofread their answers. They often left out a critical word in their answer or wrote contradictory statements. If students had taken the time to reread their work, they might have caught these careless mistakes.
- still had difficulty in budgeting their time on the test. They failed to leave about 30 minutes for Question #6, the question that counts for 25 percent of the free-response section.
- had difficulty in interpreting graphs. They seemed to think that all graphs needed to be discussed in terms of "center, shape, and spread," and did not look at the graphs in the context of the problem or look at what the question was asking.
Question 1: "Einstein's and Newton's Theory of Gamma"
A great deal of interpretation and communication was needed to successfully answer question 1. The students generally did a good job of communicating their thoughts, and there were many good, short, concise, responses.
However, some students:
In parts (b) and (c) many students:
- confused statistical terminology in critical places. This often resulted in their responses being either incorrect or ambiguous.
- used the term "margin of error" incorrectly. They did not seem to understand that the margin of error is a number and used the term to represent an interval. This is an incorrect interpretation and was counted against them in the grading of their answer.
- were unsuccessful in distinguishing between estimate and margin of error and the idea of an interval as a set of likely values of an estimate.
- used the terms "experimental values," "observations," "data," and "estimates" interchangeably in part (a). It was not clear from the context of their response whether the experimental values were estimates, margins of error, or something else.
- used the terms "observations" and "data" in a generic sense and did not seem to think of these words in statistical terms. This might be due to the lack of practice in using precise language in the classrooms or a lack of understanding that the graphic information was referring to sets of estimates rather than sets of data.
- looked at the graph and interpreted the interval to be some type of a boxplot. Students thought the estimate and margin of error represented data and variability of the data.
Question 2: Design and Randomization -- The Boot Problem
- focused on the point estimates as carrying more information than the interval estimates, and in some cases, ignored the intervals completely in their assessment of the evidence for or against a particular theory.
- seemed to feel that the converging behavior of the estimates was enough to justify one or the other theory. Students did not consider the necessity for evaluating the uncertainties in those estimates.
- used the statistical terms "error," "range," and "variability" incorrectly. The use was frequently ambiguous in the context of the problem and it was unclear whether students were referring to data or to a set of estimates.
- used the terms "impossible," "certain," and "proved" incorrectly. In scientific and statistical arenas, such levels of certainty are generally unacceptable. Those terms should not be used in data analysis and generally avoided unless discussing theorems of mathematical statistics. In the grading, students were penalized for making a statement like "the graph proves...."
- looked at the graphical display and interpreted the question to be about regression and/or the law of large numbers. It was not unusual for a student to think that the graphical display was a residual plot.
In part (a), many students:
- provided a diagram with no explanation, even though the problem specifically stated, "Include a few sentences on how it (the design) would be implemented."
- did not use an incorrect design, but did not use one that was as good as the paired or crossover design.
- described a Completely Randomized Design with two treatment groups as their design method. Although this is a correct design, it is not as good as the paired (both treatments on one subject) or crossover designs.
- suggested a paired design in which pairs of "similar" subjects would be grouped. Students did not receive full credit for this approach.
- identified potential blocking variables such as gender, occupation, climate, etc. Then they randomly assigned treatments to subjects within the blocks. While this indicates a high level of statistical thinking, it is not quite as good as a paired or crossover design.
- described one of the two designs that would constitute a complete answer (paired design or crossover design) but failed to discuss randomization at all.
- used the language of sampling in their descriptions (e.g., stratified samples and SRS), and did not understand the difference between selecting a random sample and random allocation of subjects to treatments.
- incorrectly used the terminology or vocabulary of experiments; e.g., "allocate volunteers into two blocks...."
In part (b): Double Blinding
- described a random assignment into two groups but either did not identify the treatments or incorrectly identified the treatments that were to be compared.
- failed to understand that the design was to use the 100 volunteers given, but rather concentrated on randomly selecting volunteers from the population.
- failed to describe an appropriate randomized experiment to compare current and new treatments.
- used a "coin tossing" randomization scheme to assign subjects to treatments. This was accepted, but students did not recognize that this scheme is not as good as randomization schemes that assign an equal number of subjects to each treatment group.
- alphabetized the list of volunteers, numbering the names on this list of volunteers from 1 to 100, and then assigning the even numbered names to Group 1 and the odd numbered names to Group 2. These students did not recognize that this is not a method of randomization.
- described incomplete randomization schemes. For example, "randomly allocate volunteers into two groups" or "randomly assign volunteers into two groups using an SRS" without any description of the randomization process.
- assigned numbers to boots or subjects and mentioned a random digit table but failed to explain or describe the random formation of treatment groups.
Question 3: Probability -- New High School Runners
- indicated that they understood that double blinding involved having two parties unaware of the treatment assignments; however:
- identified the volunteers as one party and someone other than the evaluator as the second party. Many students used words like "administrator," "conductor," "manufacturer" as the second party. These words did not adequately convey the idea that the evaluator was the required second party.
- identified that the second party should be the evaluator but stated that this was not possible, when in reality, it was possible.
- identified that the second party should be the "distributor" of the boots. The student did not understand "distributor" and "evaluator" were not the same.
- failed to identify the volunteers (subjects) as needing to be kept unaware of treatment assignment.
- stated that there was no need for blinding since the subjects were randomly assigned into treatment groups.
In part (a), students often:
Occasionally, they believed that they had constructed a confidence interval for an unknown mean.
- used a 2-sided analysis for the "2 standard deviation" argument (and so claimed <5% rather than <2.5%).
- claimed the event was "unlikely" based on more than 2 standard deviations from the mean but failed to invoke normality.
- claimed that random variables cannot be more than one standard deviation below the mean.
- tried to turn this problem into an inference problem. Most often, they believed (at problem's end) that they had done a test.
In part (b), students often:
For part (c), students often interpreted the team time <18.4 to mean <18.3 or <18.39.
- did not know how to compute σ for the team.
- confused the team time with the average runner's time, i.e. divided by 4 to get 4.725.
- failed to correctly carry results from part (b) into part (c).
- calculated the probability from the wrong tail of the distribution in parts (a) and/or (c).
Question 4:Regression -- Airplane Operating Costs and Passenger Seats
Overall, students demonstrated satisfactory understanding of scatterplots, correlation, and computer output for linear regression. They were able to write the equation of the least squares regression line and to determine the correlation coefficient from the information provided in the computer output. Many students seemed unsure about how to interpret correlation. Some tried to explain correlation using the coefficient of determination, r2. Few did so successfully. In part (c), most students observed that the given regression line would be a poor fit for the restricted data. The vast majority of them referenced the negative association among these points as their justification. A few commented on the pattern in the residuals over the 250 to 350 passenger seat range.
Part (a): Determining the equation of the least squares regression line from computer output.
In Part (b): Calculating and interpreting the correlation coefficient, r.
- could not interpret the computer output. Often the student misinterpreted the value s (standard deviation for the line) to represent the value for the slope of the line.
- did not define their variables carefully. For example, some used x = # of passengers or
y = operating cost per plane.
- did not include y in the regression equation. Of those who did, most did not define it correctly as the predicted operating cost per hour.
- treated the slope and the y-intercept as variables.
- wrote the equation of the least squares regression line as y = a + bx and did not recognize that the question was asking them to write the equation for the given data.
In part (c): Evaluating the quality of the given linear regression line over a restricted range.
- thought that r2 was the correlation coefficient.
- attempted to use adjusted r2 from the computer output instead of r2.
- included all four components of the correlation interpretation -- strength, direction, form, and context -- in their responses.
- described r = 0.755 as "weak" or "fairly weak" or "extremely weak." This suggests that students have not encountered enough real data sets to recognize that this is a moderately strong value of r.
- wrote the value of r in terms of a percent.
- wrote numbers for r such as 7.55 or 4.02 and did not seem to recognize that the value of r must be a number between -1 and 1.
- who attempted to explain r2 did not do so correctly. Incorrect interpretations, such as "r2 is the percent of data explained by the line" were common.
- often correctly explained the meaning of r but then gave an incorrect interpretation of r2. This was treated as a parallel solution and counted as incorrect.
- were careless in writing answers and made transcription errors, such as writing the correct value of r, 0.755, as .0755.
In general many students wrote rambling explanations and misused statistical terminology.
- made generic comments like, "Any time you remove points, you will have to calculate a new regression line" rather than focusing on the specific context of the scatterplot provided.
- mistook the question to be asking about the difference between predicting and extrapolating.
- did a very nice job at constructing a residual plot of the restricted data and then indicated that a negative correlation existed.
- often talked about influential points being removed from the graph, but did not describe what would happen to the relationship among data in the restricted domain.
- removed only the three points in the upper right-hand corner and not the lower two points.
Question 5: Inference -- Early Birds and Night Owls
Question 6: Investigative Task -- Comedy Shows: S or F?
- failed to provide conditions or gave an incomplete set of conditions for using the selected statistical test.
- listed the conditions for using the selected statistical test, but did not check them.
- did not provide linkage between their computation and conclusion.
- failed to interpret their conclusion in context of the problem.
- did not read the question in part (b) carefully and tested their new hypotheses rather than the ones listed in the statement of part (a).
- did not seem to understand the question in part (a). They often gave two distinct sets of hypotheses either by repeating the hypotheses listed in the original statement or gave the same set of hypotheses twice just with a different arrangement of parameters.
- defined their hypotheses using improper notation. It was not unusual to see students use p for the notation of a parameter without clearly indicating that it was intended as a population measure.
- failed to identify the parameter (e.g., mean, median, or proportion) used in part (a). They gave statements such as "E is the early birds who recall no dreams."
- incorrectly described their conclusion using phrases such as "at the 95% confidence level, we reject the null hypothesis." These students did not seem to understand the difference between a confidence level and an alpha level.
- reversed the direction of the inequality in the alternate hypothesis or wrote their alternative hypothesis as a two-tailed test. When students reversed the direction of the inequality, they did not seem to be able to recognize this error, even with a large p-value.
Part (a) asked students to create and interpret a 95% confidence interval for a proportion.
Part (b) asked students to interpret the level of confidence.
- failed to check the appropriate assumptions for this confidence interval.
- did not appear to understand that the interpretation of a confidence interval is meaningless unless the appropriate conditions have been satisfied.
- omitted the interpretation of the confidence interval, even though the question specifically asked for it in part (a).
- gave the interpretation of the confidence interval in part (b). Frequently students incorrectly interpreted the interval as "95% of the population is between (0.517, 0.625)" or "95% of the time the proportion is in the interval (0.517, 0.625)." Students struggled with the interpretation of the confidence interval.
- did not write the meaning of the confidence interval in context of the problem.
Part (c) asked students to perform a hypothesis test to compare two proportions.
- gave the interpretation of the interval (0.517, 0.625) requested in part (a) rather than an interpretation of the level of confidence, 95%.
- interpreted the level of confidence incorrectly in terms of the specific interval from part (a), (0.517, 0.625). This often took the form, "95% of confidence intervals from repeated sampling would have a proportion in the interval (0.517, 0.625)." They did not seem to understand that repeated sampling produces different intervals.
- had difficulty with notation. They often stated their hypotheses in terms of sample statistics, rather than in terms of the population parameters.
- forgot to check all appropriate conditions.
- did a good job with computations and interpretation in context.
Suggestions for Teachers
- failed to recognize that the difference in sample sizes created an imbalance in the pooled estimate. Students who recognized the need to balance the sample size were generally successful.
Diann Resnick has been teaching AP Statistics at Bellaire High School in Bellaire, Texas, since 1995. She is a Faculty Consultant for the College Board and has conducted workshops and summer institutes on the teaching of statistics since 1985. She is a table leader for AP Statistics and was a member of the AP Statistics Task Force and Development Committee.
- Tests of Inference are not just finding a number. That is only one part of a complete answer.
- Along with demonstrating sound analysis techniques and correct computation, students are expected to exhibit good written communication in their responses. Therefore, while the quality of student writing is something that is not usually emphasized in a mathematics class, it is critical to a complete response in a statistics problem.
- Have students say what they need to say and then stop. Writing more is not necessarily better.
- Students should practice in class with tests that are similar in format with an AP Exam.
- Students should be given data in many different forms and formats. This might best be accomplished by giving students problems and examples from different books.
- Students need practice in reading computer output.
- Often students only see examples of bivariate data that has an r value greater than .9 or smaller than -.9. They should be exposed to data that has weaker correlation.
- It would be helpful for teachers to write a list of statistical words (range, mean, data, variance, etc.) that are often used casually by students. When used in context of a statistics problem, they should be used correctly.
- If a paper has two (parallel) solutions to a problem and one is not correct, the incorrect one is the one that is scored.
- It is helpful if students do statistics (design experiments, collect data, work on computer labs, etc.) throughout the year, not just learn about statistics.
- Write tests with questions taken from many different sources. It forces students to think.
- It would be helpful if in designing experiments, the design make sense or be practical.
- It is important that teachers and students have practice in grading papers holistically. One way this can be accomplished is by using old AP Exams.
- Often in probability problems, pictures often help the student.
- Students often did not use their calculator in the most efficient manner. They did not seem to know how to use the calculator to perform statistical tests and made errors in the computation of a confidence interval or a z value.