# Removing ensemble mean introduces correlation

Ensemble mean removal is often performed in multi-variate time series analysis, when one suspects global instantaneous fluctuation of signal is additively introduced, and wants to remove it. For example, if a time series of images are provided as the signal, there may be additional uncontrolled light source mixed with the intended signal, and it is desirable to remove this effect. Ensemble mean removal in the simplest case can be simply done by taking the instantaneous temporal mean, and subtracting it from each channel. When the gain of each channel is assumed to be heterogeneous (but with same sign), one can still take the ensemble mean and compute the optimal gain for each channel. When taking the ensemble mean, the assumptions is that the time-locked common component to signal ratio increases, so given enough channels (at least more than 10).

Note that if where is the observation from channel *i*, is a zero mean random process that is spatially independent, and is the fixed realization of a zero mean random process with variance fluctuation among the channels, the instantaneous cross-correlation is . Hence if are all positive (or all negative), such common additive fluctuation creates positive correlation.

However, if one analyzes the cross-correlation between channels after ensemble mean removal, one would find that there is a tendency that the cross-correlation at zero lag is smaller (often negative) than expected. In essence, this is kind of a small sample size effect. The problem is even when the term is perfectly removed by the ensemble average subtraction, the empirical mean is not zero, and again we are removing this from each channel as well. This can be demonstrated in the simple case where the underlying process are independent and have the same variance, then

.

Therefore ensemble mean introduces negative correlation. The following MATLAB code demonstrates it.

`N = 1000; % length of time series`

M = 20; % number of channels

eta = 0.1 * randn(N,1);

X = randn(N,M);

Y = X + repmat(eta,1,M);

esbm = mean(Y,2); % compute the ensemble mean

Z = Y - repmat(esbm,1,M);

figure; hold all;

[mcc,lags] = meanXcorr(X,10); plot(lags, mcc);

[mcc,lags] = meanXcorr(Y,10); plot(lags, mcc);

[mcc,lags] = meanXcorr(Z,10); plot(lags, mcc);

legend('X', 'Y', 'Z');

where the function meanXcor is simply computes the mean pairwise cross-correlation.

`function [mcc, lags] = meanXcorr(X, maxLag)`

k = 0;

for n1 = 1:size(X,2)

for n2 = n1+1:size(X,2)

k = k + 1;

[cc(:,k), lags] = xcorr(X(:,n1), X(:,n2), maxLag, 'coeff');

end

end

mcc = mean(cc,2);

click through on the figure, it is much easier to see the green line. Solid post Memming, thanks!