# Statistics of AB Testing
---
## Review Questions
### 1. Hypothesis Testing Basics
- What are the differences between a z-test and t-test?
- When to use z-test versus a t-test?
- Given the data, how would you calculate the t-statistic or z-statistic?
### 2. Hypothesis Testing + A/B Testing
- Given a test result, calculate if the result is significant
- How to make launch decisions?
- How would you use hypothesis testing in practice?
### 3. Hypothesis Testing + SQL
- Query average "likes" in control and treatment groups
- Compte test statistic, and tell if it's significant
---
## What are the available tests?
- t-test
- z-test
- Welch's t-test (when the sample variances are not similar)
## What are the differences between the different tests? (t-distribution vs z-distribution)
- t-distribution is more spread out
- Standard deviation is known
- t-distribution produces wider confidence interval than z-distribution
![[t_vs_z_distr.png]]
### Why don't we use t-tests for proportions?
In the usual t-tests, the t-statistics are all of the form: d/s, where s is an estimated standard error of d. The t-distribution arises from the following
The reason for this is that the test statistic does not have a t-distribution. For one-sample or two-sample proportion tests:
- Test statistics follow the form d/s
- Asymptotically normal
- No justification for a t-distribution
Why do people use t-tests for proportions?
- They are (at least academically) wrong
- But approximation using t-distribution on Bernoulli data is good
- *For a large sample, t-distribution and z-distribution have similar results*
## How to know which test to use?
The following flowchart can be used to determine what chart to use
![[hypothesis_test.png]]
- For small samples it is important to check if the data is normally distributed
- For larger samples, we can invoke [[Central Limit Theorem]] and assume the distribution of sample means is approximately normally distributed
- z-test is less commonly used in reality as population variance is often unknown
### Note: [[Bernoulli Distribution]]
- Distribution of a random variable
- $Pr(1) = p$
- $Pr(0) = 1-p$
- Example: Click throuhg probability (CTP) -> Pr(click), Pr(no click)
- This is also what you would use to understand changes in proportions. Example, percentage of users or pages
## Two-Sample Test of Proportion
Experiment: test color of a button
- Click through probability: N(users who clicked) / N (total users)
- 1000 users in both control & treatment groups
Results:
- Control group: 1.1% CTP
- Treatment group: 2.3% CTP
Question:
- Can we conclude that the difference between the two groups is significant?
- Do you recommend launching this experiment?
Note:
- Practical significant boundary: 0.01
- Significance level $\alpha = 0.05$
Questions to answer:
1. Which hypothesis tests to use?
- Either clicks or doesn't click: -> [[Bernoulli Distribution]]
- Control group: $n * p = 1000 * 1.1\% = 11$
- Treatment group = 23
- Test statistic follows *z-distribution* (population proportions)
- Measurements
- Users clicked $X_{ct}, X_{tr}$
- Total number of users $n_{ct}, n_{tr}$
$\hat{p}_{ct} = \frac{X_{ct}}{n_{ct}} = \frac{11}{1000}$
$\hat{p}_{tr} = \frac{X_{tr}}{n_{tr}} = \frac{23}{1000} $
2. What is the null hypothesis?
$d = \hat{p_{tr}} - \hat{p_{ct}}$
$H_0:p_{ct}, d=0$
- Test statistic
$TS = \frac{\hat{p_{tr}} - \hat{p_{ct}}}{SE}$
- Choose SE such that it can represent both groups -> Pooled standard error
1. First calculate *pooled* SE $\hat{p}$
$\hat{p} = \frac{X_{ct} + X_{tr}}{n_{ct} + n_{tr}} = \frac{11+23}{1000+1000} = 0.017$
2. Compte *pooled* SE
$SE = \sqrt{\hat{p} (1-\hat{p}) (1/n_{ct} + 1/n_{tr})} = 0.00578$
$TS = 2.076$
3. Is the result statistically significant?
- Critical z-score = 1.96
- TS > 1.96 pr TS < -1.96 then reject null hypothesis
4. Is the result practically significant?
- You can determine this by using the confidence interval
## Two-Sample Test of Means
Experiment: if a new feature changes average number of posts
- 30 users in both control & treatment groups
- Control: [1, 0, 1, 3, 2, ...]
- Treatment: [0, 2, 3, 1, 0, ...]
- Mean of control = 1.4
- Mean of treatment = 2
- $\alpha = 0.05$
- Practical significant boundary = $0.05$
- Question: should you launch this feature?
- Now we are dealing with a two sample distribution with unknown but similar variances
- Compute *pooled variance*
- Goal: measure the difference
$d = \mu_t - \mu_c$
- Null hypothesis
$H_0: \mu_c = \mu_t, d = 0$
- Test statistic with *pooled variance*
$TS = \frac{\mu_t - \mu_c}{S_{pool}\sqrt{\frac{1}{n_c} + \frac{1}{n_t}}}$
- *Pooled standard error*
$S_{pool} = \sqrt{\frac{SS_c + SS_t}{df}}$
- Continue [here](https://www.youtube.com/watch?v=6uw0A3aKwMc)
## Sources
- [Crack Hypothesis Testing Problems in Data Science Interviews | Binomial test, z-test and t-test](https://www.youtube.com/watch?v=IY7y-t30UJc)