|
|
|
 |
 |
 |
|
Statistical Specifications
|
|
|  | One of the most important parts of making sure that AP Exams are both comparable to college-level exams and reasonably parallel across exams from year to year is the creation of "statistical specifications" that measure the distribution of item difficulties on an exam. The Development Committees and content experts work closely with ETS statisticians in applying these specifications when they develop AP Exams.
The delta is an index of item difficulty used at ETS. For the multiple-choice sections of AP Exams, the statistical specifications are made up of desired distribution of deltas, with a particular mean and standard deviation. The calculation of deltas, the process of delta equating, and the differences between observed and equated deltas are described in detail in the analyzing section of this Technical Corner.
There are two different models for constructing exams: - Model I. This model, illustrated in Table 2.2, makes use of item response theory (IRT); detailed information on using IRT in the development of statistical specifications can be found in Marco (1977). Results of IRT analyses of multiple-choice questions can be directly translated into distributions of deltas that are suitable for use as statistical specifications. The AP Program recommends granting credit and/or advanced placement to students receiving grades of 3, 4, or 5, and these delta distributions were specifically chosen because they provided excellent discrimination of students at the 2 to 3 cut-off point (i.e., the definition of "qualified students"), and more than adequate discrimination at the 3 to 4 cut-off point. A detailed discussion of the assignment of AP grades is presented in the grading section of this Technical Corner.
The Model I distributions were developed using one specific exam administered to a group of AP students in a particular year. As a result, the specifications are reexamined on a periodic basis and adjusted as necessary to be kept relevant to the ability levels of current groups of AP students.
- Model II. For the smaller-volume exams (that is, those that are taken by fewer students), IRT is not used to develop statistical specifications. Instead, ETS statisticians develop separate distributions of observed deltas for those exams that have four-choice items and for those that have five-choice items (see Table 2.4). These distributions are centered on middle difficulty and have approximately a normal distribution around the mean. Slightly more items are specified for delta intervals below the mean than above in order to maximize, to the extent possible, discrimination around the 2 to 3 cut-off point.
Subjects that use Model II are: AP Art History, Computer Science, Economics (mean 11.9), Environmental Science (mean 12.0), French Literature, German Language, International English, Latin, Music Theory, Spanish Literature, Statistics, and World History.
|
|
|
|
|
|