Parametric vs. Non-parametric Procedures

# Parametric vs. Non-parametric Procedures A parametric test makes assumptions about a population’s parameters: **1. Normality** — Data in each group should be normally distributed **2. Independence** — Data in each group should be sampled randomly and independently **3. No Outliers** — no extreme outliers in the data **4. Equal Variance** — Data in each group should have approximately equal variance A **non-parametric test** (sometimes referred to as a _distribution free test_) does not assume anything about the underlying distribution We can assess normality visually using a [[Q-Q (quantile-quantile) plot|Q-Q (quantile-quantile) plot]]. In these plots, the observed data is plotted against the expected quantile of a normal distribution. A demo code in python is seen here, where a random normal distribution has been created. If the data are normal, it will appear as a straight line. ```python import numpy as np import statsmodels.api as statmod import matplotlib.pyplot as plt#create dataset with 100 values that follow a normal distribution data = np.random.normal(0,1,100)#create Q-Q plot with 45-degree line added to plot fig = statmod.qqplot(data, line='45') plt.show() ``` ![[qq_plot_demo.png]] - Tests to check for normality - Shapiro-Wilk - Kolmogorov-Smirnov *The null hypothesis of both of these tests is that the sample was sampled from a normal (or Gaussian) distribution. Therefore, if the p-value is significant, then the assumption of normality has been violated and the alternate hypothesis that the data must be non-normal is accepted as true.* ![[parametric_vs_nonparametric_tests.webp]] KM, used in [[Customer Lifetime Value — Survival Analysis|survival analysis]] is a non-parametric procedure, whereas Cox Regression is a semi-parametric procedure. # Advantages and Disadvantages Non-parametric tests have several advantages, including: - More statistical power when assumptions of parametric tests are violated. - Assumption of normality does not apply - Small sample sizes are ok - They can be used for all data types, including ordinal, nominal and interval (continuous) - Can be used with data that has outliers Disadvantages of non-parametric tests: - Less powerful than parametric tests if assumptions haven’t been violated