Residuals - nati.sh

## Residuals The method of leverage has to do with detecting unusual features. How do we detect *unusual behavior?* For that we shall use regression residuals. *Regression residuals* measure the difference between predicted outcome values and observed outcome values (predicted vs test set). The residuals, in essence, capture what the model is not able to explain. A high residual is an indicator of an outlier. ### Computing Residuals For linear regression, the residuals are given by: $ \begin{align} \hat{e_i} &= y_i - \hat{y_i} \\ \hat{e_i} &= y_i - \hat{\beta}x_i \end{align} $ The model is given by: ```python Y_hat = X @ np.linalg.inv(X.T @ X) @ X.T @ Y ``` Plotting it: ```python plt.scatter(X, Y, s=50, label='data') plt.plot(X, Y_hat, c='k', lw=2, label='prediction') plt.vlines(X, np.minimum(Y, Y_hat), np.maximum(Y, Y_hat), color='r', lw=3, label="residuals"); plt.legend() plt.title(f"Regression prediction and residuals"); ``` ![[residual_scatter.png]] We can then identify which residuals are unusual by checking if they are two standard deviations away from the mean: ```python df['residual'] = np.abs(Y - X @ np.linalg.inv(X.T @ X) @ X.T @ Y) df['high_residual'] = df['residual'] > (np.mean(df['residual']) + 2*np.std(df['residual'])) ``` ![[residual_detection.png]]