Jump to page content Jump to navigation

College Board

AP Central

AP Teacher Communities
AP Exams & College Enrollment
Click here to visit the SpringBoard Microsite
Print Page
Home > AP Courses and Exams > Course Home Pages > Random Variables vs. Algebraic Variables

Random Variables vs. Algebraic Variables

by Peter Flanagan-Hyde
Phoenix Country Day School
Paradise Valley, Arizona

Why  x + x = 2x , but  X + X  2X
It is typical for students in an introductory statistics course to be confronted with the idea of a random variable for the first time. Through the good training in their previous work in mathematics courses, most have grown reasonably well acquainted with variables as they are used in algebra. Random variables, however, differ from these algebraic variables in important ways that often bewilder students. In this article, I will look at some of these differences and examine why they present difficulties for many students.

A random variable is often introduced to students as a value that is created by some random process. Most students have enough experience with rolling dice, flipping coins, or drawing cards to get off to a good start in the process of understanding the idea of a random variable. However, the conceptual pool of random variables quickly deepens, as the term is generally used in both a more abstract way and a more varied way in most statistics textbooks. At times, it is used to refer to the outcome of a single event, as in  P(X = k); at other times, it is used to refer to the entire pattern of possible outcomes rather than a single event (as in  µx = 10). This is a long step of understanding for many students to take.

This contrasts with their experiences in algebra, where variables typically have a single, though hidden, value. Finding this hidden value may stump students, to be sure, but they know that there is a value to be found. There actually are times in algebra where variables are used to refer to a pattern of many values, such as in the equation of a parabola, such as y = x2 + 3x - 4. However, the important idea of a function means that for a given x value,  y has only one possible value. Indeed, many of the problems that are given to students, where variables are used as functions, is to again find a particular value, or perhaps several values, such as the coordinates of the vertex or the intercepts.

Let's look at some examples to illustrate these differences. First, let  x be an algebraic variable. Perhaps  x = the number of students in a given classroom. This is an unknown value, perhaps, but students can picture a classroom and imagine counting the students; or perhaps there is a puzzle for them to solve to determine the value of  x. Students' experience also includes manipulating algebraic variables to create expressions that they use in developing solutions to new problems. For instance, let us imagine in our classroom of  x students that each student has a textbook and a workbook on his or her desk. Then the number of books out in the room is x textbooks and x workbooks. Algebra students are drilled to simplify this, with  x +  x = 2x representing the total number of books on student desks. This is given even more weight as an example of the distributive property, one of the fundamental principles of mathematics that students encounter repeatedly.

A random variable, on the other hand, represents a completely different concept. Here is an example that will illustrate this difference. Imagine that the teacher in our hypothetical algebra class has an extra credit scheme in which each student who completes all of a week's assignments on time gets to roll a die, with bonus points equal to the result of the die added to his or her score. At the beginning of the term, students enthusiastically respond by doing all their assignments, earning between 1 and 6 extra credit points. This point value, call it X, is a random variable, since its value is determined by the outcome of a random process. There are six different possible values for X, the integers from 1 to 6. Each is equally likely to occur, so students often write expressions like .In this case, students do think of the random variable X as representing a single, unknown value, in the same way that they think about algebraic variables. But X really refers to the distribution of possible values and the associated probabilities. These are shown below:

X 1 2 3 4 5 6
P(x) 1/6 1/6 1/6 1/6 1/6 1/6


Using standard formulas, students encounter expressions like µx = 3.5 and  , which use X in this way. Without much experience with these types of variables, it often takes some time before students' intuition about what the various symbols actually represent catches up.

Back in our algebra class, as the semester wears on, some students are not completing all the work. So, as an extra incentive late in the term, the teacher decides to offer a double bonus, between 2 and 12 points. There are now two different ways that the teacher can proceed: Either have the student roll one die, then double the value, or roll two dice and add the sum as bonus points. Are these procedures any different from each other? They have the same maximum and minimum values (2 and 12) and the same mean (7). However, they neither have the same probability distribution nor the same variability. In the first case, there are only six possible outcomes when doubling the value of one die: the even numbers from 2 to 12. In the second case, all the values from 2 to 12 might result from the random process.

To examine the variability, we need to compare the probability distributions of each process. If X refers to the random variable of a single toss of a die, then the value of the random bonus in the first process, doubling the value of the roll, is 2X. Let us call this new random variable D for "doubling." We could write, correctly, D = 2X. Here is the probability distribution of D:
D 2 4 6 8 10 12
P(D) 1/6 1/6 1/6 1/6 1/6 1/6


The outcomes are uniformly distributed, since all outcomes have the same probability of occurring. Again employing the standard formulas, µD = 7 and
. How do these compare to the values for rolling a single die, X? It's easy enough to see from the numerical values that µD = 2µX and  , and these values are predicted by the rules for multiplying a random variable by a constant.

When two dice are rolled, though, the results are different. Call the random variable that represents the outcomes of the two-dice process T (for "two"). We could write T = X + X. This equation represents the fact that T is the result of two independent instances of the random variable T. Each time you write the symbol T, you imply a random draw from the specified population. Here is the probability distribution of T:

T 2 3 4 5 6 7 8 9 10 11 12
P(T) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36


Again employing the standard formulas,µT = 7 and
.

Since (T) results from two independent instances of (X), the formulas for calculating the mean and variance of independent random variables can be used to confirm the values for the mean and standard deviation of (T). So we have µT = µX + µX = 2µX = 7 and  , and  . It is clear from these values and the discussion above that (T) (D), or (X) + (X)  2(x) . This is a little hard for many students to swallow and is part of the difficulty they face sorting out the differences.

It is also true that students do not have an immediate understanding of terms such as "variability," which are used as they work with random variables. At first glance, many students would describe the second process as more variable. It can result in 11 different outcomes, as opposed to only six in the first case. They mistake the idea of variety for the more formally defined measures of variability.

As you introduce your students to random variables, be prepared to address student confusion regarding the ideas that they are accustomed to, algebraic variables, and the new ideas. Provide your students with settings in which they can work with random variables, write expressions using random variables, and gain the intuition that they will need to use these ideas effectively in their work in statistics.


Peter Flanagan-Hyde has been a math teacher for 27 years, teaching in Phoenix, Arizona, for the past 15 years. With a BA from Williams College and an MA from Teachers College, Columbia University, he has pursued a variety of interests, including geometry, calculus, physics, and the use of technology in education. Peter has taught AP Statistics since its inception in the 1996-1997 school year. He became an Exam Reader in 1999 and a Table Leader in 2004. He has conducted numerous workshops and summer institutes in statistics, has presented at a variety of conferences, and has authored several sets of student activities.





  ABOUT MY AP CENTRAL
    Course and Email Newsletter Preferences
  AP COURSES AND EXAMS
    Course Home Pages
    Course Descriptions
    The Course Audit
    Teachers' Resources
    Exam Calendar and Fees
    Exam Information
  PRE-AP
    SpringBoard®
  AP COMMUNITY
    About Electronic Discussion Groups
    Become an AP Exam Reader

Back to top