Statistics and Mathematics
Descriptive Statistics and Data Summarization
- What you Need to Know
-
Measures of Central Tendency
- Mean, median, and mode calculations and interpretations
- Weighted averages and trimmed means
- When to use each measure based on data distribution
- Resources:
- Descriptive Statistics - Khan Academy - Central tendency and variability
- Statistics with Python - SciPy statistical functions
- Descriptive Statistics Guide - Penn State descriptive statistics
-
Measures of Variability and Distribution
- Variance, standard deviation, and range
- Quartiles, percentiles, and interquartile range
- Skewness, kurtosis, and distribution shape
- Resources:
- Variability Measures - Khan Academy variability concepts
- Distribution Analysis - SciPy statistical distributions
- Statistical Summaries - Pandas descriptive statistics
-
Data Visualization for Exploration
- Histograms, box plots, and distribution visualization
- Scatter plots and correlation visualization
- Summary statistics and data profiling
- Resources:
- Data Visualization Basics - Python visualization examples
- Matplotlib Tutorial - Python plotting library
- Seaborn Statistical Plots - Statistical data visualization
-
Probability Theory and Distributions
- What you Need to Know
-
Probability Fundamentals
- Sample spaces, events, and probability rules
- Conditional probability and Bayes' theorem
- Independence and mutually exclusive events
- Resources:
- Probability - Khan Academy - Complete probability course
- Think Bayes - Free Bayesian statistics book
- Probability Course - MIT - MIT probability theory
-
Common Probability Distributions
- Normal, binomial, and Poisson distributions
- Exponential, gamma, and beta distributions
- Distribution properties and parameter estimation
- Resources:
- Probability Distributions - Khan Academy distributions
- SciPy Statistical Distributions - Python statistical functions
- Distribution Gallery - Distribution relationships and properties
-
Central Limit Theorem and Sampling
- Sampling distributions and standard error
- Central Limit Theorem applications
- Confidence intervals and margin of error
- Resources:
- Central Limit Theorem - Khan Academy CLT explanation
- Sampling Methods - Penn State sampling techniques
- Bootstrap Methods - ESL bootstrap chapter
-
Statistical Inference and Hypothesis Testing
- What you Need to Know
-
Hypothesis Testing Framework
- Null and alternative hypotheses formulation
- Type I and Type II errors understanding
- P-values, significance levels, and statistical power
- Resources:
- Hypothesis Testing - Khan Academy hypothesis testing
- Statistical Inference - Johns Hopkins statistical inference
- P-value Interpretation - Penn State hypothesis testing
-
Common Statistical Tests
- t-tests for mean comparisons
- Chi-square tests for categorical data
- ANOVA for multiple group comparisons
- Resources:
- Statistical Tests Guide - Python statistical testing
- SciPy Statistical Tests - Statistical test implementations
- Choosing Statistical Tests - Test selection guide
-
Effect Size and Practical Significance
- Cohen's d and effect size interpretation
- Confidence intervals for effect estimation
- Statistical vs practical significance
- Resources:
- Effect Size - Effect size concepts and calculation
- Confidence Intervals - Khan Academy confidence intervals
- Statistical Power - Power analysis and sample size
-
Linear Algebra for Data Science
- What you Need to Know
-
Vectors and Vector Operations
- Vector addition, scalar multiplication, and dot products
- Vector norms and distance metrics
- Vector projections and orthogonality
- Resources:
- Linear Algebra for Data Science - Fast.ai computational linear algebra
- NumPy Linear Algebra - Linear algebra with NumPy
- 3Blue1Brown Vectors - Visual vector explanation
-
Matrix Operations and Properties
- Matrix multiplication and transpose operations
- Matrix inverse and determinant calculations
- Eigenvalues, eigenvectors, and matrix decomposition
- Resources:
- Matrix Operations - Khan Academy matrix operations
- Eigenvalues and Eigenvectors - Visual eigenvalue explanation
- Matrix Decomposition - Matrix decomposition for ML
-
Dimensionality Reduction Concepts
- Principal Component Analysis (PCA) mathematical foundation
- Singular Value Decomposition (SVD) applications
- Linear transformations and feature space mapping
- Resources:
- PCA Tutorial - Principal Component Analysis explained
- SVD Applications - MIT SVD tutorial
- Dimensionality Reduction - Scikit-learn decomposition methods
-
Experimental Design and A/B Testing
- What you Need to Know
-
Experimental Design Principles
- Randomized controlled trials and experimental controls
- Sample size calculation and power analysis
- Blocking, stratification, and confounding variables
- Resources:
- Experimental Design - Arizona State University experimental design
- A/B Testing Course - Udacity A/B testing fundamentals
- Design of Experiments - Penn State experimental design
-
A/B Testing Implementation
- Test design and hypothesis formulation
- Randomization and treatment assignment
- Statistical analysis and result interpretation
- Resources:
- A/B Testing Guide - Optimizely A/B testing methodology
- Experimental Analysis - Microsoft experimentation platform
- Statistical Power Calculator - Sample size and power calculation
-
Advanced Experimental Techniques
- Multi-armed bandit testing
- Factorial designs and interaction effects
- Quasi-experimental methods and observational studies
- Resources:
- Multi-Armed Bandits - Bandit algorithm implementations
- Causal Inference - Causal inference methods
- Quasi-Experiments - University of Pennsylvania causality course
-
Bayesian Statistics and Advanced Methods
- What you Need to Know
-
Bayesian Inference Fundamentals
- Prior and posterior distributions
- Bayes' theorem applications in data analysis
- Credible intervals vs confidence intervals
- Resources:
- Bayesian Statistics - Duke University Bayesian methods
- Think Bayes - Bayesian statistics with Python
- PyMC Documentation - Probabilistic programming in Python
-
Markov Chain Monte Carlo (MCMC)
- MCMC sampling methods and convergence
- Gibbs sampling and Metropolis-Hastings algorithms
- Bayesian model fitting and diagnostics
- Resources:
- MCMC Tutorial - MCMC sampling explanation
- PyMC Tutorial - Bayesian modeling with PyMC
- Bayesian Analysis with Python - Bayesian analysis examples
-
Time Series Analysis
- What you Need to Know
-
Time Series Components and Decomposition
- Trend, seasonality, and cyclical patterns
- Time series decomposition methods
- Stationarity testing and transformation
- Resources:
- Time Series Analysis - Forecasting: Principles and Practice (free online)
- Time Series with Python - Statsmodels time series analysis
- Pandas Time Series - Time series functionality
-
Forecasting Methods
- ARIMA models and Box-Jenkins methodology
- Exponential smoothing techniques
- Seasonal forecasting and trend analysis
- Resources:
- ARIMA Modeling - ARIMA implementation in Python
- Forecasting Tutorial - Time series forecasting methods
- Prophet Forecasting - Facebook's forecasting tool
-
Ready to Analyze Data? Continue to Module 2: Data Analysis to master data manipulation, cleaning, and exploratory analysis techniques.