Quick random question: I don't expect anyone to have come accross a solution to this problem, but just in case they have...

In doing numerical floating point work is full of pitfalls where you can lose almost all accuracy, one of which is when summing/sum-squaring/etc with a large set of small values (so that each individual value is likely to be small relative to the aggregate statistics). I'm thinking about how to compute a covariance matrix for a large set of items subject to the following requirements:

1. Anything other than a "look a each data item once" approach is unlikely to be viable (ie, not a two-pass algorithm).

2. There will be covariances for several different data streams being calculated interleaved (ie, it's not "here's a stream for one dataset, get it's covariance then move onto the next data stream").

However, I don't care about the covariance until everything has finished: it doesn't have to be in available form as the data is being processed.

[This wikipedia page](http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance) contains an algorithm, but it's maintaining a finished covariance matrix at all times. I'm just wondering if there's anything that gives higher accuracy through needing final post-processing at the end.

In doing numerical floating point work is full of pitfalls where you can lose almost all accuracy, one of which is when summing/sum-squaring/etc with a large set of small values (so that each individual value is likely to be small relative to the aggregate statistics). I'm thinking about how to compute a covariance matrix for a large set of items subject to the following requirements:

1. Anything other than a "look a each data item once" approach is unlikely to be viable (ie, not a two-pass algorithm).

2. There will be covariances for several different data streams being calculated interleaved (ie, it's not "here's a stream for one dataset, get it's covariance then move onto the next data stream").

However, I don't care about the covariance until everything has finished: it doesn't have to be in available form as the data is being processed.

[This wikipedia page](http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance) contains an algorithm, but it's maintaining a finished covariance matrix at all times. I'm just wondering if there's anything that gives higher accuracy through needing final post-processing at the end.