Skip to content

Removing ensemble mean introduces correlation


Ensemble mean removal is often performed in multi-variate time series analysis, when one suspects global instantaneous fluctuation of signal is additively introduced, and wants to remove it. For example, if a time series of images are provided as the signal, there may be additional uncontrolled light source mixed with the intended signal, and it is desirable to remove this effect. Ensemble mean removal in the simplest case can be simply done by taking the instantaneous temporal mean, and subtracting it from each channel. When the gain of each channel is assumed to be heterogeneous (but with same sign), one can still take the ensemble mean and compute the optimal gain for each channel. When taking the ensemble mean, the assumptions is that the time-locked common component to signal ratio increases, so given enough channels (at least more than 10).

Note that if Y_i(t) = X_i(t) + \alpha_i \eta(t) where Y_i is the observation from channel i, X_i is a zero mean random process that is spatially independent, and \eta(t) is the fixed realization of a zero mean random process with variance \sigma^2 fluctuation among the channels, the instantaneous cross-correlation is E\left[Y_i Y_j\right] = \alpha_i \alpha_j \sigma^2. Hence if \alpha_i are all positive (or all negative), such common additive fluctuation creates positive correlation.

However, if one analyzes the cross-correlation between channels after ensemble mean removal, one would find that there is a tendency that the cross-correlation at zero lag is smaller (often negative) than expected. In essence, this is kind of a small sample size effect. The problem is even when the \alpha_i \eta(t) term is perfectly removed by the ensemble average subtraction, the empirical mean \sum_i X_i is not zero, and again we are removing this from each channel as well. This can be demonstrated in the simple case where the underlying process are independent and have the same variance, then

E\left[ \left(X_i - \frac{1}{N} \sum_i X_i\right) \left(X_j - \frac{1}{N} \sum_i X_i\right) \right] = -\frac{E[X_i^2]}{N}.

Therefore ensemble mean introduces negative correlation. The following MATLAB code demonstrates it.

Demonstration of ensemble mean removal to cross-correlation

The blue line corresponds to the independent signal, the green the common fluctuation added, and the red the ensemble removed.

N = 1000; % length of time series
M = 20; % number of channels
eta = 0.1 * randn(N,1);
X = randn(N,M);
Y = X + repmat(eta,1,M);
esbm = mean(Y,2); % compute the ensemble mean
Z = Y - repmat(esbm,1,M);
figure; hold all;
[mcc,lags] = meanXcorr(X,10); plot(lags, mcc);
[mcc,lags] = meanXcorr(Y,10); plot(lags, mcc);
[mcc,lags] = meanXcorr(Z,10); plot(lags, mcc);
legend('X', 'Y', 'Z');

where the function meanXcor is simply computes the mean pairwise cross-correlation.
function [mcc, lags] = meanXcorr(X, maxLag)
k = 0;
for n1 = 1:size(X,2)
for n2 = n1+1:size(X,2)
k = k + 1;
[cc(:,k), lags] = xcorr(X(:,n1), X(:,n2), maxLag, 'coeff');
mcc = mean(cc,2);

One Comment leave one →
  1. 2010/09/01 6:23 am

    click through on the figure, it is much easier to see the green line. Solid post Memming, thanks!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: