|
|  | The following is a joint post-exam project for AP Calculus and AP Statistics students.
I recommend a project based on the Gini Index. The Gini Index is a measure of the inequity of distribution (income, land ownership, use of energy, etc). This is a project I have my students do in calculus each year and I have also had my statistics students work on part of it. I think it would be great as a joint project.
Daniel J. Teague NC School of Science and Mathematics Durham, North Carolina Imagine lining up all the households in the U.S., from the household with the smallest income to the household with the largest income (that's the Gate's household). Now, divide this linear ordering into five equally sized groups. Each group contains 20 percent of U.S. households. What fraction of the total income of the U.S. does each fifth have?
In 2000, the distribution of income reported by the census bureau was the following:
The first fifth of the households held 3.6 percent of the total income. The second fifth of the households held 8.9 percent of the total income. The third fifth of the households held 14.9 percent of the total income. The fourth fifth of the households held 23.0 percent of the total income. The highest fifth of the households held 49.6 percent of the total income.
(A list of historical income distributions since 1967 can be found in "See also," below.)
We would like to develop a method of "measuring" how inequitable this distribution income is (calculus) and use this measure to compare the inequity for other years, other countries, or Democratic versus Republican presidencies (statistics).
First we need a way to measure inequity. Imagine a country in which the distribution of income is perfectly equitable. In this case, the bottom 20 percent of the households will have 20 percent of the income. The bottom 40 percent will have 40 percent of the income, the bottom 60 percent of the households will have 60 percent of the income, and the bottom 80 percent will have 80 percent of the income. Of course, 100 percent of the households have 100 percent of the income. The cumulative distribution will be represented by the line y = x. The area under y = x from 0 to 1 is 1/2.
Now consider a country in which the distribution is perfectly inequitable. In this case, the bottom 20 percent of the households will have 0 percent of the income. The bottom 40 percent will have 0 percent of the income, the bottom 60 percent of the households will have 0 percent of the income, and the bottom 80 percent will have 0 percent of the income. Only one person has all the income and everyone else has nothing. The cumulative distribution will be represented by the line y = 0. The area under y = 0 from 0 to 1 is 0.
In 2000, we have:
The lowest fifth of the households held 3.6 percent of the total income. The lowest two-fifths of the households held 3.6 percent + 8.9 percent = 12.5 percent of the total income. The lowest three-fifths of the households held 12.5 percent + 14.9 percent = 27.4 percent of the total income. The lowest four-fifths of the households held 27.4 percent + 23.0 percent = 50.4 percent of the total income. Of course, the lowest five-fifths of the households (all of them) held 50.4 percent + 49.6 percent = 100 percent of the total income.
This gives the ordered pairs (0.2, 0.36), (0.4, 0.125), (0.6, 0.274), (0.8, 0.504), and (1.0, 1.0). Of course (0,0) is also a point in this measure. These points define a function that lies between y = 0 and y = x. This function is known as the Lorenz curve, L(x). The more inequitable the distribution, the greater the area between y = x and y = L(x).
Since the largest possible area is 1/2, perfect inequity, we will define the Gini Index as the ratio of the area between y = x and y = L(x) to 1/2. (This also answers the student question, "When will we ever be interested in knowing the area between two curves?")
Now, how do we find the Lorenz curve? We know that the function must pass through (0,0) and (1,1), so we choose a power function y = x^n as our model. We cannot just use our calculators to find the power function y = ax^n, since this does not necessarily pass through (1,1). (While Lorenz curves can have many different shapes, depending on what is being modeled, in this setting models of the form y = x^n are typically used.)
Calculus students must develop a method for finding L(x) by minimizing the sums of squares for the function. S(n) = (y1 - .2^n)^2 + (y2 - .4^n)^2 + (y3 - .6^n)^2 + (y4 - .8^n)^2.
Notice that (0,0) and (1,1) give you no information for this function, but the form y = x^n guarantees these points will be on the curve. They will need to use numerical techniques to find the zero of the derivative.
They could also re-express the data and find the least squares solution to: S(n) = [ln(y1) - n ln(.2)]^2 + [ln(y2) - n ln(.4)]^2 + [ln(y3) - n ln(.6)]^2 + [ln(y4) - n ln(.8)]^2 They can do this analytically.
Once they have found a method for generating a Lorenz curve from the data, they need to find the Gini Index.
This is a simple computation, GI = defint(x - L(x), x, 0, 1)/0.5
(A Gini Index can also be computed directly from the data using a trapezoid rule, without first finding the Lorenz curve, but that removes some of the calculus from the problem.)
Once the Gini Indices have been computed, we can proceed to the statistical questions, such as:
- Are the Gini Indices lower for Republican administrations than for Democratic administrations? Interpret the result of this investigation.
- Are the Gini Indices higher for industrialized countries than for agrarian countries? Interpret the result of this investigation.
- Is there a relationship between the Gini Index and unemployment levels?
- Is there a relationship between the Gini Index and race, etc.?
A more detailed explanation of the Gini Index can be found in "See also," below.
|