Dimensionality Reduction 1

# Dimensionality Reduction There are two aspects to data: - Quality - Quantity In an ideal scenario we would have high quality data with sufficient quantity. However, sometimes we run into problems: - Lack of computational resources to train on our dataset - Noise in the data that does not contribute to the predictive power we are looking for - Sparcity One way of alliveating these probelems is by modifying the data sest such that it is more digestable by reducing its dimensions. There are two types of dimensionality reduction: 1. Feature Selection: Selecting features (and deleting others) from existing ones 2. Feature Extraction: Creating entirely new features from existing ones - Linear Techniques a. Random Projection 1. **Gaussian Random Projection:** Multiplying input with dense random matrix, elements of which are derived from Gaussian distribution. 2. **Sparse Random Projection:** Multiplying input with sparse random matrix for faster computation and being memory efficient. b. [[Principal Component Analysis]]: Finding orthogonal hyperplanes (lower-dimensional linear subspaces) that capture maximum variance in data. c. Matrix Factorization Techniques: 1. **Singular Value Decomposition (SVD):** It’s not explicitly a dimension reduction method but a matrix factorization technique. However, it can be used for dimension reduction using its Low-rank approximation. 2. **CUR Decomposition** - Non-Linear Techniques a. t-SNE b. Kernel PCA