# Module 2 - Producing Data and Sampling
>[!Abstract] In this module, you will look at the main concepts for sampling and designing experiments. You will learn about curious pitfalls and how to evaluate the effectiveness of such experiments.
>
>
>- **Population:** the entire group of subjects about which we want information
>- **Parameter:** the quantity about the population we are interested in
>- **Sample:** the part of the population from which we collect information
>- **Statistic (estimate):** the quantity we are interested in as measured in the sample
```toc
```
## Sampling Correctly
Incorrectly sampling a population usually leads to **biases**
- *Selection bias:* a sample of convenience makes it more likely to sample certain subjects than others
- *Non-response bias:* parents are less likely to answer a survey request at 6 pm because they are busy with children and dinner
- *Voluntary response bias:* websites that post reviews of businesses are more likely to get responses from customers who had very bad or very good experiences
### Sampling Designs
The best methods for sampling use chance in a planned way:
- A simple random sample selects subjects at random without replacement
- A stratified random sample divides the population into groups of similar subjects called *strata* (eg. urban, suburban and rural voters). Then one chooses a simple random sample in each stratum and combines these
### Bias and Chance Error
Since the sample is drawn at random, the estimate will be different from the parameter due to [[Chance Error|chance error]]
$\text{estimate} = \text{parameter + bias + chance error}$
The chance error is *unavoidable*, but we can make it small by taking a large sample size. It is within our control to choose a sample size large enough, such that the chance error is sufficiently small.
The **bias** can also be seen as a *systematic error* present in the system. Note that increasing the sample size just repeats the error on a larger scale, and typically we don't know how large the bias is.
## Observation vs. Experiment, Confounding, and the Placebo Effect
An **observational study** measures outcomes of interest and it is able to *establish association.* It is worth noting that association does not mean causation, this is because there may be confounding factors that are associated with with the associated variables. Sometimes confounding factors are referred to as lurking variables.
In order to establish causation, an [[AB Testing|experiment]] is required. This is done by splitting the population into two groups:
- Treatment group
- Control group
1. The best way to make sure the two groups are similar is to assign the persons into the two groups at random, such that the confounding factors in both groups are equally present.
- Randomisation serves two purposes:
1. It makes the treatment group similar to the control group. Therefore influences other than the treatment operate equally on both groups, apart from differences due to chance.
2. It allows to assess how relevant the treatment effect is, by calculating the size of chance effects when comparing the outcomes in the two groups.
2. When possible, subjects in the control group get a **placebo:** a treatment that resembles the treatment but is neutral. Assigning a placebo makes sure that both groups are equally affected by the **[[Placebo Effect|placebo effect]]:** the idea of being treated may have an effect by itself.
3. The experiment is **double-blind:** neither the subjects nor the evaluators know the assignments to treatment and control