ML Fundamentals
Mathematical Foundations for Machine Learning
- What you Need to Know
-
Linear Algebra in Machine Learning
- Vector spaces and linear transformations in ML contexts
- Matrix operations for data representation and model parameters
- Eigendecomposition and Singular Value Decomposition (SVD)
- Resources:
- Linear Algebra for ML - Fast.ai computational linear algebra course
- Matrix Calculus for Deep Learning - Matrix derivatives and gradients
- Linear Algebra Review - CS229 - Stanford ML linear algebra notes
-
Calculus and Optimization Theory
- Gradient computation and chain rule for backpropagation
- Convex optimization and global vs local minima
- Lagrange multipliers and constrained optimization
- Resources:
- Convex Optimization - Boyd - Free convex optimization textbook
- Optimization for Machine Learning - Modern optimization techniques
- Calculus on Computational Graphs - Backpropagation explained
-
Probability Theory and Statistical Learning
- Bayesian inference and maximum likelihood estimation
- Probability distributions and their ML applications
- Information theory and entropy in machine learning
- Resources:
- Probabilistic Machine Learning - Kevin Murphy's comprehensive ML book
- Information Theory - MacKay - Information theory and inference
- Bayesian Methods for Machine Learning - HSE University course
-
Supervised Learning Algorithms
- What you Need to Know
-
Linear Models and Regularization
- Linear regression with mathematical derivation
- Ridge, Lasso, and Elastic Net regularization
- Logistic regression and maximum likelihood
- Resources:
- Linear Models - ESL - Elements of Statistical Learning chapters
- Regularization Tutorial - Ridge and Lasso implementation
- Logistic Regression Math - Mathematical foundations
-
Tree-Based Methods and Ensemble Learning
- Decision trees and information gain criteria
- Random Forest and bagging techniques
- Gradient boosting and XGBoost implementation
- Resources:
- Decision Trees - Scikit-learn - Decision tree algorithms and implementation
- Random Forest Paper - Breiman's original random forest paper
- XGBoost Documentation - Gradient boosting framework
-
Support Vector Machines and Kernel Methods
- SVM optimization problem and dual formulation
- Kernel trick and non-linear transformations
- Soft margin and regularization in SVMs
- Resources:
- SVM Tutorial - MIT SVM mathematical foundations
- Kernel Methods - Learning with Kernels textbook
- SVM Implementation - Scikit-learn SVM guide
-
Unsupervised Learning and Dimensionality Reduction
- What you Need to Know
-
Clustering Algorithms
- K-means clustering and expectation-maximization
- Hierarchical clustering and linkage criteria
- DBSCAN and density-based clustering
- Resources:
- Clustering Algorithms - Comprehensive clustering guide
- K-means Mathematical Analysis - CMU clustering lecture
- DBSCAN Paper - Original DBSCAN algorithm
-
Principal Component Analysis and Matrix Factorization
- PCA mathematical derivation and implementation
- Singular Value Decomposition applications
- Non-negative Matrix Factorization (NMF)
- Resources:
- PCA Tutorial - Principal Component Analysis explained
- SVD and PCA - MIT SVD tutorial
- Matrix Factorization - Netflix recommendation system
-
Manifold Learning and Non-linear Dimensionality Reduction
- t-SNE for visualization and clustering
- UMAP for dimensionality reduction
- Autoencoders for non-linear feature learning
- Resources:
- t-SNE Paper - Original t-SNE algorithm
- UMAP Documentation - Uniform Manifold Approximation
- Autoencoder Tutorial - Keras autoencoder implementation
-
Deep Learning Fundamentals
- What you Need to Know
-
Neural Network Architecture and Training
- Multi-layer perceptrons and universal approximation theorem
- Backpropagation algorithm mathematical derivation
- Gradient descent variants and optimization techniques
- Resources:
- Deep Learning Book - Goodfellow, Bengio, and Courville comprehensive text
- Neural Networks and Deep Learning - Nielsen's online book
- Backpropagation Calculus - 3Blue1Brown backprop explanation
-
Convolutional Neural Networks
- Convolution operation and feature maps
- CNN architectures (LeNet, AlexNet, VGG, ResNet)
- Pooling layers and spatial hierarchies
- Resources:
- CNN for Visual Recognition - Stanford CS231n CNN guide
- CNN Architectures - Survey of CNN architectures
- ResNet Paper - Deep residual learning
-
Recurrent Neural Networks and Sequence Modeling
- RNN architecture and vanishing gradient problem
- LSTM and GRU for long-term dependencies
- Attention mechanisms and Transformer architecture
- Resources:
- Understanding LSTMs - LSTM architecture explained
- Attention Is All You Need - Original Transformer paper
- RNN Tutorial - Karpathy's RNN guide
-
Model Selection and Validation
- What you Need to Know
-
Cross-Validation Techniques
- K-fold cross-validation and stratified sampling
- Leave-one-out and bootstrap validation
- Time series cross-validation for temporal data
- Resources:
- Cross-Validation - Scikit-learn CV documentation
- Model Selection - ESL model assessment chapter
- Time Series CV - Time series validation techniques
-
Bias-Variance Tradeoff and Regularization
- Mathematical analysis of bias-variance decomposition
- Regularization techniques and their effects
- Early stopping and dropout as regularization
- Resources:
- Bias-Variance Tradeoff - Visual bias-variance explanation
- Regularization in Deep Learning - Comprehensive regularization survey
- Dropout Paper - Original dropout technique
-
Hyperparameter Optimization
- Grid search and random search strategies
- Bayesian optimization and Gaussian processes
- Automated hyperparameter tuning techniques
- Resources:
- Hyperparameter Optimization - Bergstra and Bengio survey
- Bayesian Optimization - Gaussian process optimization
- Optuna Tutorial - Automated hyperparameter optimization
-
Feature Engineering and Selection
- What you Need to Know
-
Feature Extraction and Transformation
- Polynomial features and interaction terms
- Feature scaling and normalization techniques
- Handling categorical variables and encoding
- Resources:
- Feature Engineering - O'Reilly feature engineering book
- Preprocessing Data - Scikit-learn preprocessing guide
- Categorical Encoding - Advanced categorical encoding techniques
-
Feature Selection Methods
- Filter methods (correlation, mutual information)
- Wrapper methods (recursive feature elimination)
- Embedded methods (Lasso, tree-based importance)
- Resources:
- Feature Selection - Comprehensive feature selection guide
- Information Theory Feature Selection - Mutual information methods
- Recursive Feature Elimination - RFE implementation
-
Evaluation Metrics and Performance Analysis
- What you Need to Know
-
Classification Metrics
- Accuracy, precision, recall, and F1-score analysis
- ROC curves and Area Under Curve (AUC)
- Multi-class and multi-label evaluation metrics
- Resources:
- Classification Metrics - Comprehensive metrics guide
- ROC and AUC Explained - Google's ROC tutorial
- Multi-class Metrics - Multi-class evaluation survey
-
Regression Metrics
- Mean Squared Error (MSE) and Root Mean Squared Error (RMSE)
- Mean Absolute Error (MAE) and robust metrics
- R-squared and adjusted R-squared interpretation
- Resources:
- Regression Metrics - Regression evaluation guide
- Understanding R-squared - R-squared interpretation
- Robust Regression Metrics - Alternative regression metrics
-
Algorithm Implementation from Scratch
- What you Need to Know
-
Linear Algebra Implementation
- Matrix operations without NumPy dependencies
- Gradient computation and optimization loops
- Numerical stability and computational efficiency
- Resources:
- ML Algorithms from Scratch - Pure Python implementations
- Numerical Linear Algebra - Computational linear algebra course
- Matrix Cookbook - Matrix identities and derivatives
-
Optimization Algorithm Implementation
- Gradient descent variants (SGD, Adam, RMSprop)
- Newton's method and quasi-Newton methods
- Coordinate descent and proximal methods
- Resources:
- Optimization Algorithms - Overview of optimization for deep learning
- SGD Variants - Gradient descent optimization overview
- Proximal Algorithms - Proximal optimization methods
-
Ready to Engineer Data? Continue to Module 2: Data Engineering to master data pipelines, preprocessing, and feature engineering for machine learning systems.