Jump to page content Jump to navigation

College Board

AP Central

AP Teacher Communities
AP Exams & College Enrollment
Click here to visit the SpringBoard Microsite
Print Page
Home > Common Errors on the 2003 AP Statistics Exam

Common Errors on the 2003 AP Statistics Exam

by Diann Resnick
Bellaire High School
Houston, Texas

General Comments
Question 1: Students' and Teachers' Watches -- Which Has the Correct Time?
Question 2: Type I , Type II Error -- The Auto Defect Problem
Question 3: Probability -- Men's Shirt and Neck Sizes
Question 4: Experimental Design -- Tai Chi or Yoga
Question 5: Inference -- Knowledge of Foreign Affairs
Question 6: Investigative Task: The Shuttle Service -- Coach Versus Vans
Suggestions for Teachers


This year, as in past years, the AP Statistics Exam Reading was quite an educational experience. The best part of the Reading was seeing how well written some of the students' papers were. These papers clearly showed that good teaching and learning was happening in the classroom. With the increase in exams (about 58,000 exams this year), there were wide variations in the responses and level of preparedness of the students. Listed below are comments regarding the exam in general, followed by comments listed by question.

General Comments
Many students had the following difficulties:
  • Seemed to have difficulty recognizing the difference between a population and a sample. They did not seem to know the difference between a parameter and statistic.
  • Were very sloppy with statistical notation and definitions. It was not uncommon to see students use creative notation ( to represent a population proportion) and use the word mean to represent proportions or p/π to represent a mean.
  • Anticipated test questions and often answered not the question asked, but what they thought the question was or what they wanted it to be. Many students did not read the questions carefully.
  • Seemed to think that the questions were meant to be tricky and therefore tried to be creative when a straightforward answer was best.
  • Wrote more than was necessary to answer a question. It often appeared that some students were not sure of their answer so they added extraneous material. In doing so, they often wrote incorrect statements and were either penalized for the extraneous incorrect statements, or the statements were considered parallel solutions. In the case of parallel solutions, the worst of the two answers is graded, and many students lost credit for a problem.
  • Did not proofread their answers. They often left out a critical word in their answer or wrote contradictory statements. If students had taken the time to reread their work, they might have caught these careless mistakes.
  • Still had difficulty in budgeting their time on the exam. They failed to leave about 30 minutes for Question 6 -- the investigative task that counts as 25 percent of the free-response section.
Question 1: Students' and Teachers' Watches -- Which Has the Correct Time?
Question 1 required students to put together three different aspects of the AP syllabus.

The first section, part (a), instructed students to construct parallel boxplots. Some students made these errors:
  • Forgot to label the two graphs. It was not clear which boxplot went with each set of data.
  • Did not include a scale or did not use a common scale. This made it difficult for students to make accurate comparisons.
  • Did not follow instructions and drew separate boxplots (using two different scales) that were not parallel.
  • Were careless in plotting the five-number summary points.
  • Drew graphs other than boxplots, such as stemplots and histograms.
  • Indicated outliers, when none existed, or applied an outlier rule (of their own manufacture) to identify some outliers but not all.
  • Miscalculated the quartiles.
  • Confused the mean with the median.
The second section, part (b), instructed students to decide which of the two groups had watches that were closer to the correct time and to justify their answer. Some students:
  • Did not answer the question, e.g., did not make a choice between the students' and teachers' watch times. The students only made reference to statistical values.
  • Focused on the centers of the data (which are fairly similar) instead of the dispersion of the data (which are quite different).
  • Talked about dispersion but did not tell which measure of dispersion was being discussed.
  • Incorrectly interpreted the boxplots. This often happened because the boxplots were not labeled or were labeled incorrectly.
  • Talked about "late" versus "early-ness" of students and teachers based on their thoughts or personal experiences and did not use the graphs for the justification of their answers.
The third section, part (c), asked students to determine if the given set of hypotheses were appropriate for the problem. Some students:
  • Checked assumptions/conditions to see if a known procedure could be used to carry out the hypothesis test (e.g., sample size too small) rather than focusing on whether the hypotheses addressed the issues posed in the teacher's question.
  • Modified the teacher's question, usually in an effort to make it fit the proposed hypotheses.
  • Incorrectly claimed that the hypothesis test would measure the closeness of all students' watches.
  • Thought that the two-sided alternative ( ) properly addressed the positive/negative issue.
  • Talked about watches and "late" versus "early-ness" based on their thoughts or experiences, without addressing the question.
Question 2: Type I , Type II Error -- The Auto Defect Problem
This question asked students to state the null and alternative hypotheses from a stated scenario and then define Type I and Type II errors in context of the scenario.

In part (a), many students:
  • Did not understand the meaning of the term "parameter of interest." Instead of defining the population under study, they often:

    (a) Gave a definition of a variable.

    (b) Gave a definition referring to the sample.

    (c) Used symbols  etc., that were not recognized as the parameters they were "defining."
  • Wrote the wrong direction  in the alternative hypothesis.
  • Used an " = " sign in both the Ho and Ha.
  • Left out an " = " sign in the null hypothesis.
  • Defined the hypothesized parameter as 5 percent of the sample size, e.g., , and referred to the average number of defects as opposed to the proportion of defects.
  • Wrote 0.5 for 0.05 or 5 percent.
In part (b), many students:
  • Thought "consequences" were the "taking" or "not taking" of the class action lawsuit. Students did not mention the financial consequences as a result of a correct or incorrect decision to take or not take the class action lawsuit.
  • Mistakenly thought that consequences were winning or not winning the lawsuit.
  • Reversed the definitions of Type I and Type II errors.
  • Wrote a template answer for the meaning of Type I and Type II errors and did not write the error definitions in the context of the problem.
  • Correctly defined the Type I and Type II errors and then reversed the consequences of these errors.
  • Mistakenly thought that the test was the ability to detect whether a car was defective or not, not the percentage of cars that were defective.
Question 3: Probability -- Men's Shirt and Neck Sizes
On this question, students often:
  • Rounded intermediate calculations.
  • Failed to show work -- even when the problem said, "Show your work."
  • Used the Empirical Rule instead of tables or calculators in evaluating probabilities for non-integer z-scores.
  • Reversed the subtraction in the numerator when calculating their z-scores, and thus made a sign error.
  • Treated the number line as discrete, e.g., using an endpoint of 13.9 when evaluating
    P(x ‹ 14)
  • Treated only one "no shirt" region in part (a), i.e., using only one tail of the distribution and not recognizing the need to look at both ends of the distribution.
  • Noted that most shirt sizes fall within three standard deviations of the mean and thus deciding that P(of not having a shirt) = 0.
  • Manipulated tail probabilities incorrectly, e.g., calculating P(15 ‹ x ‹ 16) as
    P(x > 15) - P(x ‹ 16).
  • Failed to put scales or variable identifications on their graphs.
  • Failed to distinguish between normal and binomial distributions in part (c).
  • Did not recognize that the probability calculated in (b) was the probability of "success" for the binomial setting in part (c).
  • Wrote the probability of success in part (c) as any convenient fraction, e.g., 4 of 12 customers, 1 of 4 sizes, medium or not medium, etc.
  • Used the calculator built-in function cdf instead of pdf in calculating the binomial probability P(x = 4)
  • Failed to include the binomial coefficient in the calculation of P(x = 4).
  • Wrote the binomial coefficient as a common fraction, such as 4/12 or 12/4.
Question 4: Experimental Design -- Tai Chi or Yoga
On this question, many students:
  • Did not seem to understand the difference between random allocation of subjects and random sampling.
  • Did not understand that the variable of interest was the change in stress levels and not the "before" stress levels versus the "after" stress levels.
  • Often used the word "confounding" in part (a), but did not explain how the treatment results were mixed up with some other variable.
  • Seemed to think in part (c) that a larger sample size would fix any problem in the experiment. The students did not seem to understand that the major problem of the experiment was that there was no random sampling of employees.
  • Incorrectly stated that random allocation "eliminates" bias.
  • Used pronouns without clear reference to what they represented. It was a guess as to what the students meant.
  • Used statistical words incorrectly. It appeared as if the students had no clear understanding as to the words' meanings (bias, confounding, blocking, lurking variables, placebo, stratification, etc.).
  • Did not focus in on the aspect of the problem that randomization would address.
  • Seemed to have memorized a template answer without a clear understanding of meaning.
Question 5: Inference -- Knowledge of Foreign Affairs
On this question, students often:
  • Used boxplots or bar graphs to answer this question. Many students did not seem to know that they were supposed to use inference procedures for the solution to this question.
  • Used only probability concepts in their solution, as an example, the  .
  • Drew conclusions by merely inspecting the data or by inspecting the percentages of males and of females in each response category. The students did not use a formal inference test.
  • Stated that the study was invalid because the number of females and males were not the same.
  • Used phrases such as "affected by," "based on," "influenced by," and "connected to," even though the stem of the question asked if response was dependent on gender. These were graded as correct answers, although they were not the preferred answer. Students need to remember to answer the question in terms of the way the problem is stated.
  • Wrote, "There is no correlation between response and gender," as their Ho. This was scored as incorrect.
  • Stated the hypotheses as if the study was a two-sample design. Many of these papers gave hypotheses about homogeneity of proportions.
  • Gave incomplete hypotheses and failed to name the row and column variables in the hypotheses.
  • Reversed the hypotheses by writing the statement of independence as the alternate hypothesis.
  • Wrote hypotheses about means, proportions, or differences in counts.
  • Included symbols, equations, or inequalities in addition to written hypotheses.
  • Stated hypotheses about evidence for independence, e.g., Ho: there is not sufficient evidence to support the claim that response is dependent on gender.
  • Omitted the conditions/assumptions for the validity of the test.
  • Failed to display expected counts or give evidence that they had checked the expected counts in the checking of conditions for the validity of the test.
  • Believed expected counts had to be whole numbers (rounding up).
  • Did not write the condition on expected counts, but rather stated "observed counts," "all counts," and "the counts." These three terms are incorrect.
  • Performed paired or two-sample t-tests on means, or z-tests on proportions, or performed linear regression on the pairs of counts for each response category.
  • Failed to report the value of the chi-square test statistic.
  • Described calculator procedures, for example, "I entered these 10 numbers in matrix A," but did not describe the test procedure, give the name of the test, or give the formula for the test statistic.
  • Failed to link the P-value to a conclusion.
  • Compared the P-value to a significance level, but then gave an incorrect conclusion.
  • Incorrectly compared the P-value, e.g., 0.063 ‹ 0.05.
  • Failed to write a conclusion in context.
  • Wrote a conclusion that was incompatible with the stated hypotheses.
  • Interpreted P-values as six chances in 100 of observing a result this extreme, but omitted the conditional statement, "if the null hypothesis were true."
  • Interpreted P-values as "these results," rather than "results this extreme or more extreme," or as "results at least this extreme."
  • Computed p-values as 2P(X2 > 8.923) or drew a graph of a symmetric curve and shaded in both tails outside of 0.06 and -0.06.
  • Computed degrees of freedom as (rows x columns) - 1, or as (rows - 1) + (columns - 1).
Question 6: Investigative Task: The Shuttle Service -- Coach Versus Vans
On this question in part (a), many students:
  • Substituted 0.76 into each equation instead of using a value less than 0.76.
  • Gave the correct answer but did not give a reason based either on the graph or using an algebraic method. Communication skills were lacking.
On this question in part (b), many students:
  • Used a z* that was associated with a 90 percent confidence interval. Students seemed to have difficulty using the tables.
  • Neglected to state the conditions for the confidence interval to be valid.
  • Stated conditions without checking them.
  • Gave a template answer, with no context, for the confidence interval statement.
  • Gave an interpretation of the confidence level instead of the confidence interval. Students did not seem to understand the difference between the two.
  • Misinterpreted the confidence interval to imply that the population proportion of similar markets with strong demand was a variable and not a fixed, but unknown, parameter. For example, "95 percent of the time, the true proportion of similar markets with strong demand will be between 0.56 and 0.74." This was a very common mistake.
  • In interpreting the confidence interval, did not mention the population of interest (markets similar to Lonestar's), and this population was frequently either omitted or incorrectly identified.
On this question in part (c), many students:
  • Used the point estimate of 0.65 to make their decision and not the confidence interval.
  • Talked about the fact that all plausible values are below 0.76, but again did not make reference to the confidence interval in part (b).
On this question in part (d), many students:
  • Did not realize that, for a particular market, demand will be either strong or weak.
  • Argued that the statistical analysis was invalid or had the potential for errors. Examples included responses indicating that the sample size was too small, Lonestar's market was different -- even though the problem stated similar markets -- and that the true proportion could be outside the interval.
  • Recalculated the confidence interval using a 99 percent confidence level instead of 95 percent.
  • Based their response on the point estimate of 0.65 and not the fact that the confidence interval contained values all over 0.50, which meant that there was a very strong likelihood of strong demand.
Suggestions for Teachers
  • Have students carry out all steps necessary for the different types of inference tests. Students did not seem to understand all the necessary parts for an inference test that must be written to have their answer scored as "Essentially correct."
  • Tests of Inference are not just finding a number. That is a very small part of a complete answer.
  • Written communication is often more important than numerical values. When a student score is assessed, often the written communication will determine whether a score is rounded up or down.
  • Have students say what they need to say and then stop. Writing more is not necessarily better.
  • Students should practice in class with tests that are similar in format to an AP Exam.
  • Students should be given data in many different forms and formats. This might best be accomplished by giving students problems, examples, and test questions from different books.
  • Students need practice in reading computer output.
  • Often students only see examples of bivariate data that has an r-value greater than 0.9 or smaller than -0.9. They should be exposed to data that has weaker correlation.
  • It would be helpful for teachers to write a list of statistical words (range, mean, data, variance, etc.) that are often used casually by students. When used in context of a statistics problem, they should be used correctly.
  • If a paper has two (parallel) solutions to a problem and one is not correct, the incorrect one is the one that is scored.
  • It is helpful if students do statistics (design experiments, collect data, work on computer labs, etc.) throughout the year, not just learn about statistics.
  • Write tests with questions taken from many different sources. It forces students to think.
  • It would be helpful if in designing experiments, the design make sense or is practical.
  • It is important that teachers and students have practice in grading papers holistically. This can be accomplished by using old AP Exams.
  • In statistics problems, pictures often help students.
  • Students must use labels and scales on all graphs.
  • Students often do not use their calculators in the most efficient manner. They do not seem to know how to use the calculator to perform statistical tests. Because of this, students often made errors in the computation of a confidence interval or in the computation of a test statistic.
  • In taking the actual exam, it is often very helpful for students to underline key ideas and to draw on the graphs that are provided in the test booklet.

Diann Resnick taught AP Statistics at Bellaire High School in Houston, Texas, for seven years. She was a member of the College Board Advanced Placement Statistics Development Committee and has been a Table Leader at the AP Statistics summer Readings for six years. Resnick, a Presidential and Tandy Scholar, is on the writing team of Laying the Foundation to Advanced Placement Mathematics and has been active in statistics outreach programs since 1986. She has conducted statistics workshops for the Woodrow Wilson National Fellowship Foundation, the American Statistical Association, and the College Board.


  ABOUT MY AP CENTRAL
    Course and Email Newsletter Preferences
  AP COURSES AND EXAMS
    Course Home Pages
    Course Descriptions
    The Course Audit
    Teachers' Resources
    Exam Calendar and Fees
    Exam Information
  PRE-AP
    SpringBoard®
  AP COMMUNITY
    About Electronic Discussion Groups
    Become an AP Exam Reader

Back to top