Regression Toward the Mean (3 of 6)

There are just not many people who can afford to be unlucky and still score as high as 750. A person scoring 750 was, more likely than not, luckier than average. Since, by definition, luck does not hold from one administration of the test to another, a person scoring 750 on one test is expected to score below 750 on a second test. This does not mean that they necessarily will score less than 750, just that it is likely. The same logic can be applied to someone scoring 250. Since there are more people with "true" scores between 250 and 300 than between 200 and 250, a person scoring 250 is more likely to have a "true" score above 250 and be unlucky than a "true" score below 250 and be lucky. This means that a person scoring 250 would be expected to score higher on the second test. For both the person scoring 750 and the person scoring 250, their expected score on the second test is between the score they received on the first test and the mean.

This is the phenomenon called "regression toward the mean." Regression toward the mean occurs any time people are chosen based on observed scores that are determined in part or entirely by chance. On any task that contains both luck and skill, people who score above the mean are likely to have been luckier than people who score below the mean. Since luck does not hold from trial to trial, people who score above the mean can be expected to do worse on a subsequent trial. This counterintuitive phenomenon is illustrated concretely by a simulation found here.

In regression with standardized variables, the regression equation is:

Zy' = (r)Zx

where Zy' is the predicted standardized score, Zx is the standardized score on the predictor, and r is Pearson's correlation. This means that the predicted standardized score will be closer to the mean of zero whenever the correlation is not perfect (not -1 or 1).

For example, if the SAT had a mean of 500 and a standard deviation of 100, then a score of 750 would have a standard score equivalent of 2.5 since 750 is two and a half standard deviations above the mean. If the test-retest correlation were 0.90, then the predicted standard score for someone with a standard score of 2.5 would be (0.90)(2.5) = 2.25. Therefore, they would be predicted to be 2.25 standard deviations above the mean on the retest which is 500 + (2.25)(100) = 725.