Introduction

The Kolmogorov-Smirnov test (KS-test) is a non-parametirc statistic method to determine if two datasets differ significantly.

The KS statistic quantifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distrbution. img

Applications

Using KS-test, we can:

1. Compare whether the data follows a certain distribution.

  from scipy import stats
  import numpy as np
  x = np.linspace(-15, 15, 9)
  stats.kstest(x, 'norm')
  #
  Result:
  KstestResult(
    statistic=0.44435602715924361, 
    pvalue=0.038850142705171162
  )

It is under the null hypothesis that the two distributions are identical, G(x)=F(x). The alternative hypothesis can be either ‘two-sided’ (default), ‘less’ or ‘greater’. The KS test is only valid for continuous distributions. The result shows that the two distributions are different.

2. Compare the distributions from two samples.

  x = np.linspace(-15, 15, 9)
  y = x
  stats.ks_2samp(x, y)
  # 
  Result:
  Ks_2sampResult(statistic=0.0, pvalue=1.0)

This tests whether 2 samples are drawn from the same distribution. Here the result pvalue = 1 means we could not reject the null hypothesis that the 2 samples are from the same distribution.

Reference:

  1. https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test