Algorithms for calculating variance
The formula for calculating the population variance:
The formula for calculating unbiased estimation of the population variance from finite samples is:
The method of calculation may be more easily understood from the table below
where the mean is 8.
| i |
xi |
xi-mean |
(xi-mean)2 |
| (index) |
(datum) |
(deviation) |
(squared deviation) |
| 1 |
5 |
-3 |
9 |
| 2 |
7 |
-1 |
1 |
| 3 |
8 |
0 |
0 |
| 4 |
10 |
2 |
4 |
| 5 |
10 |
2 |
4 |
| n=5 |
sum=40 |
0 |
18 |
- mean = 40/5 = 8
- variance = (5*338 - 402)/(25-5) = 4.5
- standard deviation =
= 2.12
Note: Details of the variance calculation:
338 = [52 + 72 + 82 + 102 + 102]
40 = [5 + 7 + 8 + 10 + 10]
Algorithm
Therefore a simple algorithm to calculate variance can be:
double sum;
double sum_sqr;
double variance;
long n = data.length; // the number of elements in the data array (the actual syntax is language-specific)
for i = 0 to n
sum += data[i];
sum_sqr += ( data[i] * data[i] );
end for
variance = ((n * sum_sqr) - (sum * sum))/(n*(n-1));
Algorithm
Annother algorithm which avoids large numbers in sum_sqr while summing up
double avg;
double var;
long n = data.length; // number of elements
for i = 0 to n
avg = (avg*i + data[i]) / (i + 1);
if (i > 0) var += (var * (i - 1) + (x - avg)*(x - avg)) / i;
end for
return var; // resulting variance
Referenced By
List of mathematical topics | List of mathematical topics (A-C) | List of mathematics topics
|