How do we measure the correlation of time series? Pearson correlation analysis
When you discover that your time series have the similar trend, you may want to measure how much are they correlated. In that case, the Pearson correlation coefficient is one of the most widely used metric. It is widely used in the sciences. It was developed by Karl Pearson from a related idea introduced by Francis Galton in the 1880s. Pearson’s correlation coefficient is the covariance of the two variables divided by the product of their standard deviations. (Source: Wikipedia) If you are interested to know more about that, this paper may be relevant to you. For others who want to calculate the Pearson value, Scipy library provides a function named “pearsonr”. Alternatively, numpy library has the function named “corrcoef”. Here is my example:
(1) It can be easily discovered that the two plots have the similar trend, even though the scale of y values are different.
(2) By using pearsonr function of Scipy library, we calculate the Pearson correlation coefficient. Here are the Python codes.
from scipy.stats.stats import pearsonr import numpy as np import sys
if __name__ == "__main__": list1 = [241, 69, 72, 143, 128, 68, 126, 82, 126, 108, 68, 90, 81, 60, 72, 93, 80, 97, 65, 74, 71] list2 = [621711, 190310, 204282, 319612, 367879, 200600, 329108, 226406, 399833, 253989, 233108, 301069, 257548, 206579, 255322, 268418, 279106, 304694, 216643, 236923, 254406] if len(list1) != len(list2): print("error, two series should contain same size of elements") sys.exit # scipy library print("scipy result: ", pearsonr(list1, list2)) # numpy library print("numpy result: ", str(np.corrcoef(list1, list2)))
(3) As a result, we see that two series are highly correlated, with a Pearson coefficient value as 0.94. (Pearson’s R has a value between +1 and −1, where 1 is total positive linear correlation, 0 is no linear correlation, and −1 is total negative linear correlation)
(4) Alternatively, you may want to see how that value will be affected when we change one single value from the series. To discover that, change the value of the last element for the list1 from 71 to 710.
(5) You will observe that the Pearson score decreased significantly from 0.94 to 0.21.