Ensemble mean removal is often performed in multi-variate time series analysis, when one suspects global instantaneous fluctuation of signal is additively introduced, and wants to remove it. For example, if a time series of images are provided as the signal, there may be additional uncontrolled light source mixed with the intended signal, and it is desirable to remove this effect. Ensemble mean removal in the simplest case can be simply done by taking the instantaneous temporal mean, and subtracting it from each channel. When the gain of each channel is assumed to be heterogeneous (but with same sign), one can still take the ensemble mean and compute the optimal gain for each channel. When taking the ensemble mean, the assumptions is that the time-locked common component to signal ratio increases, so given enough channels (at least more than 10).

Note that if $Y_i(t) = X_i(t) + \alpha_i \eta(t)$ where $Y_i$ is the observation from channel i, $X_i$ is a zero mean random process that is spatially independent, and $\eta(t)$ is the fixed realization of a zero mean random process with variance $\sigma^2$ fluctuation among the channels, the instantaneous cross-correlation is $E\left[Y_i Y_j\right] = \alpha_i \alpha_j \sigma^2$. Hence if $\alpha_i$ are all positive (or all negative), such common additive fluctuation creates positive correlation.

However, if one analyzes the cross-correlation between channels after ensemble mean removal, one would find that there is a tendency that the cross-correlation at zero lag is smaller (often negative) than expected. In essence, this is kind of a small sample size effect. The problem is even when the $\alpha_i \eta(t)$ term is perfectly removed by the ensemble average subtraction, the empirical mean $\sum_i X_i$ is not zero, and again we are removing this from each channel as well. This can be demonstrated in the simple case where the underlying process are independent and have the same variance, then $E\left[ \left(X_i - \frac{1}{N} \sum_i X_i\right) \left(X_j - \frac{1}{N} \sum_i X_i\right) \right] = -\frac{E[X_i^2]}{N}$.

Therefore ensemble mean introduces negative correlation. The following MATLAB code demonstrates it. The blue line corresponds to the independent signal, the green the common fluctuation added, and the red the ensemble removed.

N = 1000; % length of time series M = 20; % number of channels eta = 0.1 * randn(N,1); X = randn(N,M); Y = X + repmat(eta,1,M); esbm = mean(Y,2); % compute the ensemble mean Z = Y - repmat(esbm,1,M); figure; hold all; [mcc,lags] = meanXcorr(X,10); plot(lags, mcc); [mcc,lags] = meanXcorr(Y,10); plot(lags, mcc); [mcc,lags] = meanXcorr(Z,10); plot(lags, mcc); legend('X', 'Y', 'Z');
where the function meanXcor is simply computes the mean pairwise cross-correlation.
function [mcc, lags] = meanXcorr(X, maxLag) k = 0; for n1 = 1:size(X,2) for n2 = n1+1:size(X,2) k = k + 1; [cc(:,k), lags] = xcorr(X(:,n1), X(:,n2), maxLag, 'coeff'); end end mcc = mean(cc,2);

1. 