## Residuals
The method of leverage has to do with detecting unusual features. How do we detect *unusual behavior?* For that we shall use regression residuals.
*Regression residuals* measure the difference between predicted outcome values and observed outcome values (predicted vs test set). The residuals, in essence, capture what the model is not able to explain. A high residual is an indicator of an outlier.
### Computing Residuals
For linear regression, the residuals are given by:
$
\begin{align}
\hat{e_i} &= y_i - \hat{y_i} \\
\hat{e_i} &= y_i - \hat{\beta}x_i
\end{align}
$
The model is given by:
```python
Y_hat = X @ np.linalg.inv(X.T @ X) @ X.T @ Y
```
Plotting it:
```python
plt.scatter(X, Y, s=50, label='data') plt.plot(X, Y_hat, c='k', lw=2, label='prediction') plt.vlines(X, np.minimum(Y, Y_hat), np.maximum(Y, Y_hat), color='r', lw=3, label="residuals"); plt.legend() plt.title(f"Regression prediction and residuals");
```
![[residual_scatter.png]]
We can then identify which residuals are unusual by checking if they are two standard deviations away from the mean:
```python
df['residual'] = np.abs(Y - X @ np.linalg.inv(X.T @ X) @ X.T @ Y) df['high_residual'] = df['residual'] > (np.mean(df['residual']) + 2*np.std(df['residual']))
```
![[residual_detection.png]]