# Linear Regression
## What is Linear Regression?
---
>[!Abstract] Linear regression is form of [[Supervised Learning]], where a model is trained on labeled input data. Linear regression has quick runtime and is generally interpretable. The goal of linear regression is to estimate a function $f(x)$ such that each feature has a linear relationship to the target variable y:
>$y = X\beta$
>Where $X$ is a matrix of predictior variables and $\beta$ is a vector of parameters that determines the weight of each variable in predicting the target variable.
Regression was noticed as a result of value drifts towards population averages:
>[!Quote] Galton called this phenomenon **regression.**
>A father's son's height tends to *regress* (or drift towards) the mean height.
```R
# By convention we use the feature we're trying to predict
# into the splitter
sample <- sample.split(df$G3, SplitRatio=0.7)
# train
train <- subset(df, sample==TRUE)
test <- subset(df, sample==FALSE)
# Train and build a model; . uses all columns
model <- lm(G3 ~ ., data=train)
# Run the model
summary(model)
# R will indicate the exploratory significance of the features using *'s
```
## Evaluating Linear Regression
Evaluation of regression models is built on the concept of a residual: the distance between what the model predicted versus the actual value, see [[Residuals]]. Generally linear regression estimates $\beta$ by minimizing the residual sum of squares (RSS):
$RSS(\beta) = (y - X\beta)^T(y - X\beta)$
The total goodness-of-fit is given by:
$R^2 = 1 - \frac{RSS}{TSS}$
Where $TSS = ESS + RSS$. Ranging from 0 to 1, the *goodness-of-fit* explains the proportion of variability in the data explained by the model.
Other evaluation metrics to measure the *goodness-of-fit* include the [[Mean Squared Error (MSE)]] and [[Mean Absolute Error (MAE)]], the former measuring the variance of the residuals whereas the later measuring the average of residuals.
>[!Note] The MSE penalizes larger errors more than MAE, making it more sensitive to outliers.
## Pros and Cons of Linear Regression
**Pros:**
- Simple to explain
- Highly interpretable
- Model training and prediction are fast
- No tuning is required (excluding regularization)
- Features don't need scaling
- Can perform well with a small number of observations
- Well-understood
**Cons:**
- Assumes a linear relationship between the features and the response
- Performance is (generally) not competitive with the best supervised learning methods due to high bias
- Can't automatically learn feature interactions
In the plot below, a linear regression model would not perform well because it would not be able to capture a non-linear relationship as there are different amounts of rentals in different seasons.
![[R_boxplot.png]]
![[Regression to the Mean]]