ML Fundamentals

Mathematical Foundations for Machine Learning

What you Need to Know
- Linear Algebra in Machine Learning
  - Vector spaces and linear transformations in ML contexts
  - Matrix operations for data representation and model parameters
  - Eigendecomposition and Singular Value Decomposition (SVD)
  - Resources:
    - Linear Algebra for ML - Fast.ai computational linear algebra course
    - Matrix Calculus for Deep Learning - Matrix derivatives and gradients
    - Linear Algebra Review - CS229 - Stanford ML linear algebra notes
- Calculus and Optimization Theory
  - Gradient computation and chain rule for backpropagation
  - Convex optimization and global vs local minima
  - Lagrange multipliers and constrained optimization
  - Resources:
    - Convex Optimization - Boyd - Free convex optimization textbook
    - Optimization for Machine Learning - Modern optimization techniques
    - Calculus on Computational Graphs - Backpropagation explained
- Probability Theory and Statistical Learning
  - Bayesian inference and maximum likelihood estimation
  - Probability distributions and their ML applications
  - Information theory and entropy in machine learning
  - Resources:
    - Probabilistic Machine Learning - Kevin Murphy's comprehensive ML book
    - Information Theory - MacKay - Information theory and inference
    - Bayesian Methods for Machine Learning - HSE University course

Supervised Learning Algorithms

What you Need to Know
- Linear Models and Regularization
  - Linear regression with mathematical derivation
  - Ridge, Lasso, and Elastic Net regularization
  - Logistic regression and maximum likelihood
  - Resources:
    - Linear Models - ESL - Elements of Statistical Learning chapters
    - Regularization Tutorial - Ridge and Lasso implementation
    - Logistic Regression Math - Mathematical foundations
- Tree-Based Methods and Ensemble Learning
  - Decision trees and information gain criteria
  - Random Forest and bagging techniques
  - Gradient boosting and XGBoost implementation
  - Resources:
    - Decision Trees - Scikit-learn - Decision tree algorithms and implementation
    - Random Forest Paper - Breiman's original random forest paper
    - XGBoost Documentation - Gradient boosting framework
- Support Vector Machines and Kernel Methods
  - SVM optimization problem and dual formulation
  - Kernel trick and non-linear transformations
  - Soft margin and regularization in SVMs
  - Resources:
    - SVM Tutorial - MIT SVM mathematical foundations
    - Kernel Methods - Learning with Kernels textbook
    - SVM Implementation - Scikit-learn SVM guide

Unsupervised Learning and Dimensionality Reduction

What you Need to Know
- Clustering Algorithms
  - K-means clustering and expectation-maximization
  - Hierarchical clustering and linkage criteria
  - DBSCAN and density-based clustering
  - Resources:
    - Clustering Algorithms - Comprehensive clustering guide
    - K-means Mathematical Analysis - CMU clustering lecture
    - DBSCAN Paper - Original DBSCAN algorithm
- Principal Component Analysis and Matrix Factorization
  - PCA mathematical derivation and implementation
  - Singular Value Decomposition applications
  - Non-negative Matrix Factorization (NMF)
  - Resources:
    - PCA Tutorial - Principal Component Analysis explained
    - SVD and PCA - MIT SVD tutorial
    - Matrix Factorization - Netflix recommendation system
- Manifold Learning and Non-linear Dimensionality Reduction
  - t-SNE for visualization and clustering
  - UMAP for dimensionality reduction
  - Autoencoders for non-linear feature learning
  - Resources:
    - t-SNE Paper - Original t-SNE algorithm
    - UMAP Documentation - Uniform Manifold Approximation
    - Autoencoder Tutorial - Keras autoencoder implementation

Deep Learning Fundamentals

What you Need to Know
- Neural Network Architecture and Training
  - Multi-layer perceptrons and universal approximation theorem
  - Backpropagation algorithm mathematical derivation
  - Gradient descent variants and optimization techniques
  - Resources:
    - Deep Learning Book - Goodfellow, Bengio, and Courville comprehensive text
    - Neural Networks and Deep Learning - Nielsen's online book
    - Backpropagation Calculus - 3Blue1Brown backprop explanation
- Convolutional Neural Networks
  - Convolution operation and feature maps
  - CNN architectures (LeNet, AlexNet, VGG, ResNet)
  - Pooling layers and spatial hierarchies
  - Resources:
    - CNN for Visual Recognition - Stanford CS231n CNN guide
    - CNN Architectures - Survey of CNN architectures
    - ResNet Paper - Deep residual learning
- Recurrent Neural Networks and Sequence Modeling
  - RNN architecture and vanishing gradient problem
  - LSTM and GRU for long-term dependencies
  - Attention mechanisms and Transformer architecture
  - Resources:
    - Understanding LSTMs - LSTM architecture explained
    - Attention Is All You Need - Original Transformer paper
    - RNN Tutorial - Karpathy's RNN guide

Model Selection and Validation

What you Need to Know
- Cross-Validation Techniques
  - K-fold cross-validation and stratified sampling
  - Leave-one-out and bootstrap validation
  - Time series cross-validation for temporal data
  - Resources:
    - Cross-Validation - Scikit-learn CV documentation
    - Model Selection - ESL model assessment chapter
    - Time Series CV - Time series validation techniques
- Bias-Variance Tradeoff and Regularization
  - Mathematical analysis of bias-variance decomposition
  - Regularization techniques and their effects
  - Early stopping and dropout as regularization
  - Resources:
    - Bias-Variance Tradeoff - Visual bias-variance explanation
    - Regularization in Deep Learning - Comprehensive regularization survey
    - Dropout Paper - Original dropout technique
- Hyperparameter Optimization
  - Grid search and random search strategies
  - Bayesian optimization and Gaussian processes
  - Automated hyperparameter tuning techniques
  - Resources:
    - Hyperparameter Optimization - Bergstra and Bengio survey
    - Bayesian Optimization - Gaussian process optimization
    - Optuna Tutorial - Automated hyperparameter optimization

Feature Engineering and Selection

What you Need to Know
- Feature Extraction and Transformation
  - Polynomial features and interaction terms
  - Feature scaling and normalization techniques
  - Handling categorical variables and encoding
  - Resources:
    - Feature Engineering - O'Reilly feature engineering book
    - Preprocessing Data - Scikit-learn preprocessing guide
    - Categorical Encoding - Advanced categorical encoding techniques
- Feature Selection Methods
  - Filter methods (correlation, mutual information)
  - Wrapper methods (recursive feature elimination)
  - Embedded methods (Lasso, tree-based importance)
  - Resources:
    - Feature Selection - Comprehensive feature selection guide
    - Information Theory Feature Selection - Mutual information methods
    - Recursive Feature Elimination - RFE implementation

Evaluation Metrics and Performance Analysis

What you Need to Know
- Classification Metrics
  - Accuracy, precision, recall, and F1-score analysis
  - ROC curves and Area Under Curve (AUC)
  - Multi-class and multi-label evaluation metrics
  - Resources:
    - Classification Metrics - Comprehensive metrics guide
    - ROC and AUC Explained - Google's ROC tutorial
    - Multi-class Metrics - Multi-class evaluation survey
- Regression Metrics
  - Mean Squared Error (MSE) and Root Mean Squared Error (RMSE)
  - Mean Absolute Error (MAE) and robust metrics
  - R-squared and adjusted R-squared interpretation
  - Resources:
    - Regression Metrics - Regression evaluation guide
    - Understanding R-squared - R-squared interpretation
    - Robust Regression Metrics - Alternative regression metrics

Algorithm Implementation from Scratch

What you Need to Know
- Linear Algebra Implementation
  - Matrix operations without NumPy dependencies
  - Gradient computation and optimization loops
  - Numerical stability and computational efficiency
  - Resources:
    - ML Algorithms from Scratch - Pure Python implementations
    - Numerical Linear Algebra - Computational linear algebra course
    - Matrix Cookbook - Matrix identities and derivatives
- Optimization Algorithm Implementation
  - Gradient descent variants (SGD, Adam, RMSprop)
  - Newton's method and quasi-Newton methods
  - Coordinate descent and proximal methods
  - Resources:
    - Optimization Algorithms - Overview of optimization for deep learning
    - SGD Variants - Gradient descent optimization overview
    - Proximal Algorithms - Proximal optimization methods

Ready to Engineer Data? Continue to Module 2: Data Engineering to master data pipelines, preprocessing, and feature engineering for machine learning systems.

Mathematical Foundations for Machine Learning​

Supervised Learning Algorithms​

Unsupervised Learning and Dimensionality Reduction​

Deep Learning Fundamentals​

Model Selection and Validation​

Feature Engineering and Selection​

Evaluation Metrics and Performance Analysis​

Algorithm Implementation from Scratch​

Mathematical Foundations for Machine Learning

Supervised Learning Algorithms

Unsupervised Learning and Dimensionality Reduction

Deep Learning Fundamentals

Model Selection and Validation

Feature Engineering and Selection

Evaluation Metrics and Performance Analysis

Algorithm Implementation from Scratch