Estimating Significance Level for
Coherences
I recently wrestled with how to properly calculate the
significance level for the coherence (related to the cross-spectra)
between two time series. This web page represents my
understanding of how to do the deed, but may contain errors.
Corrections are welcome.
I found two ways of estimating the significance level, one intuitive
and the other from a theoretical formula. The intuitive way is
basically a bootstrap (more specifically a resampling of one of your
time series without replacement). Upon reordering randomly one of
the time series, the resulting series should have no coherence with the
other series (or the original series for that matter). You then
compute the coherence of that series. This operation is performed
a large number of times to get a distribution of coherences for all
frequencies for uncorrelated series. From this the significance
level is derived.
I created a matlab function
for doing this.
The theoretical formula comes from the book Time series analysis and its applications
by Shumway and Stoffer (2000), pg. 250, equ. 3.82. Basically, the
distribution of coherences for a pair of uncorrelated series is given
by an F-distribution. After some magic, the following equation
drops out:
C(p) = F2,df-2(p) / [ df/2 - 1 + F2,df-2(p)
]
where F is the inverse of the cummulative distribution function of the
F-distribution. df are the number of degrees of freedom,
approximately 2*n*B, where n is the number of observations and B is the
bandwith.
Here is a matlab function that
does this calculation.
The relationship between df and the parameters to the matlab cohere
function is a bit obscure. I believe, that if your series are of
length N and NFFT is the size of the chunks your series gets broken
into and there is no overlap, then df = 2 * N/NFFT
approximately. This gives roughly the same answer and makes
sense.
What to do if my series have autocorrelation?
I don't have a really good answer for this. I think that the
autocorrelation scale should be used to determine how many pieces to
break you data set into (i.e. autocorrelation indicates redundancy in
the data set and therefore you should break the series into that many
pieces to get rid of that redundancy), but I am not certain. If
anyone has a good answer to this, please feel free to send it along.