The argument goes that since Σ(X - μ)²/N is an unbiased estimate of σ² and since Σ(X - M)²/N is always smaller than e Σ(X - μ)²/N, then Σ(X - M)²/N must be biased and will have a tendency to underestimate σ². It turns out that dividing by N-1 rather than by N increases the estimate just enough to eliminate the bias exactly.

Another way to think about why you divide by N-1 rather than by N has to do with the concept of degrees of freedom. When μ is known, each value of X provides an independent estimate of σ²: Each value of (X - μ)² is an independent estimate of σ². The estimate of σ² based on N X's is simply the average of these N independent estimates. Since the estimate of σ² is the average of these N estimates, it can be written as:

where there are N degrees of freedom and therefore df = N. When μ is not known and has to be estimated with M, the N values of (X-M)² are not independent because if you know the value of M and the value of N-1 of the X's, then you can compute the value of the N'th X exactly.