One minute
Statistics - Kolmogorov-Smirnov test
Introduction
The Kolmogorov-Smirnov test (KS-test) is a non-parametirc statistic method to determine if two datasets differ significantly.
The KS statistic quantifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distrbution.
Applications
Using KS-test, we can:
1. Compare whether the data follows a certain distribution.
from scipy import stats
import numpy as np
x = np.linspace(-15, 15, 9)
stats.kstest(x, 'norm')
#
Result:
KstestResult(
statistic=0.44435602715924361,
pvalue=0.038850142705171162
)
It is under the null hypothesis that the two distributions are identical, G(x)=F(x). The alternative hypothesis can be either ‘two-sided’ (default), ‘less’ or ‘greater’. The KS test is only valid for continuous distributions. The result shows that the two distributions are different.
2. Compare the distributions from two samples.
x = np.linspace(-15, 15, 9)
y = x
stats.ks_2samp(x, y)
#
Result:
Ks_2sampResult(statistic=0.0, pvalue=1.0)
This tests whether 2 samples are drawn from the same distribution. Here the result pvalue = 1 means we could not reject the null hypothesis that the 2 samples are from the same distribution.