Estimating Significance Level for Coherences


I recently wrestled with how to properly calculate the significance level for the coherence (related to the cross-spectra) between two time series.  This web page represents my understanding of how to do the deed, but may contain errors.  Corrections are welcome.

I found two ways of estimating the significance level, one intuitive and the other from a theoretical formula.  The intuitive way is basically a bootstrap (more specifically a resampling of one of your time series without replacement).  Upon reordering randomly one of the time series, the resulting series should have no coherence with the other series (or the original series for that matter).  You then compute the coherence of that series.  This operation is performed a large number of times to get a distribution of coherences for all frequencies for uncorrelated series.  From this the significance level is derived.

I created a matlab function for doing this.

The theoretical formula comes from the book Time series analysis and its applications by Shumway and Stoffer (2000), pg. 250, equ. 3.82.  Basically, the distribution of coherences for a pair of uncorrelated series is given by an F-distribution.  After some magic, the following equation drops out:

    C(p) = F2,df-2(p) / [ df/2 - 1 + F2,df-2(p) ]

where F is the inverse of the cummulative distribution function of the F-distribution.  df are the number of degrees of freedom, approximately 2*n*B, where n is the number of observations and B is the bandwith.

Here is a matlab function that does this calculation.

The relationship between df and the parameters to the matlab cohere function is a bit obscure.  I believe, that if your series are of length N and NFFT is the size of the chunks your series gets broken into and there is no overlap, then df = 2 * N/NFFT approximately.   This gives roughly the same answer and makes sense.

What to do if my series have autocorrelation?

I don't have a really good answer for this.  I think that the autocorrelation scale should be used to determine how many pieces to break you data set into (i.e. autocorrelation indicates redundancy in the data set and therefore you should break the series into that many pieces to get rid of that redundancy), but I am not certain.  If anyone has a good answer to this, please feel free to send it along.

Back to Notes