Data Structures and Algorithms
You learned to write programs in Programming Fundamentals. Now you learn to write efficient programs. This domain answers two questions: how do you organize data so that operations on it are fast, and how do you design algorithms whose resource consumption you can predict and prove? Every system you will build in later domains — database indexes in Data Management, routing tables in Networking, container scheduling in Infrastructure at Scale — depends on the data structures and algorithms taught here. Theory is not optional. The difference between an O(n) and an O(n^2) algorithm is the difference between a system that handles a million users and one that collapses at ten thousand.
A. Algorithm Analysis
Theory
Algorithm analysis is the science of predicting how an algorithm's resource consumption grows as the input grows. The resource is usually time (number of operations) or space (amount of memory), but the method applies to any measurable quantity — disk reads, network round trips, cache misses.
Asymptotic Notation
Three notations describe the growth rate of a function:
| Notation | Meaning | Intuition |
|---|---|---|
| O(f(n)) | Upper bound | The algorithm takes at most this long |
| Omega(f(n)) | Lower bound | The algorithm takes at least this long |
| Theta(f(n)) | Tight bound | The algorithm takes exactly this long (up to constants) |
Formally, T(n) is in O(f(n)) if there exist constants c > 0 and n_0 such that for all n >= n_0, T(n) <= c * f(n). The Omega definition mirrors this with >=, and Theta requires both.
Math: Asymptotic Notation
O(f(n)) = { T(n) : there exist c > 0, n_0 > 0 such that T(n) <= c * f(n) for all n >= n_0 }
Omega(f(n)) = { T(n) : there exist c > 0, n_0 > 0 such that T(n) >= c * f(n) for all n >= n_0 }
Theta(f(n)) = O(f(n)) intersect Omega(f(n))
Common growth rates, from fastest to slowest:
| Complexity | Name | Example |
|---|---|---|
| O(1) | Constant | Hash table lookup |
| O(log n) | Logarithmic | Binary search |
| O(n) | Linear | Linear search |
| O(n log n) | Linearithmic | Merge sort |
| O(n^2) | Quadratic | Nested loops (bubble sort) |
| O(2^n) | Exponential | Brute-force subset enumeration |
| O(n!) | Factorial | Brute-force permutations |
Proof by Induction
Many correctness and complexity proofs in algorithms use mathematical induction. The method has two steps:
- Base case: Prove the statement holds for the smallest input (usually n = 0 or n = 1).
- Inductive step: Assume the statement holds for n = k (the inductive hypothesis). Prove it holds for n = k + 1.
If both steps hold, the statement is true for all n >= the base case.
Math: Induction Example — Sum Formula
Claim: 1 + 2 + ... + n = n(n+1)/2
Base case: n = 1. Left side = 1. Right side = 1(2)/2 = 1. Holds.
Inductive step: Assume 1 + 2 + ... + k = k(k+1)/2. Then: 1 + 2 + ... + k + (k+1) = k(k+1)/2 + (k+1) = (k+1)(k+2)/2
This is exactly the formula with n = k+1. The claim holds for all n >= 1.
Recurrence Relations and the Master Theorem
Recursive algorithms have running times described by recurrence relations. The most common form for divide-and-conquer algorithms is:
T(n) = a * T(n/b) + O(n^d)
where a is the number of subproblems, n/b is the size of each subproblem, and O(n^d) is the cost of dividing and combining.
Math: The Master Theorem
Given T(n) = a * T(n/b) + O(n^d), with a >= 1, b > 1, d >= 0:
- If d > log_b(a): T(n) = O(n^d)
- If d = log_b(a): T(n) = O(n^d * log n)
- If d < log_b(a): T(n) = O(n^(log_b(a)))
Merge sort: a = 2, b = 2, d = 1. Since d = log_2(2) = 1, T(n) = O(n log n). Binary search: a = 1, b = 2, d = 0. Since d = log_2(1) = 0, T(n) = O(log n).
Amortized Analysis
Some operations are expensive occasionally but cheap on average. Amortized analysis gives the average cost per operation over a worst-case sequence. The classic example is a dynamic array that doubles its capacity when full. A single append that triggers a resize costs O(n), but over n appends, the total cost is O(n), so the amortized cost per append is O(1).
The accounting method assigns a fixed "charge" to each operation. Cheap operations overpay, and the surplus covers the occasional expensive operation. For a dynamic array, charge 3 units per append: 1 for the insert itself, 1 saved for copying this element during the next resize, and 1 saved for copying an element that was already present.
Practice
Profiling replaces guessing with measurement. Python's time module and timeit module let you measure actual execution time.
import time
def linear_search(arr, target):
for item in arr:
if item == target:
return True
return False
def binary_search(arr, target):
lo, hi = 0, len(arr) - 1
while lo <= hi:
mid = (lo + hi) // 2
if arr[mid] == target:
return True
elif arr[mid] < target:
lo = mid + 1
else:
hi = mid - 1
return False
# Profile both searches on a sorted array
sizes = [1_000, 10_000, 100_000, 1_000_000]
for n in sizes:
arr = list(range(n))
target = n - 1 # worst case for linear search
start = time.perf_counter()
linear_search(arr, target)
linear_time = time.perf_counter() - start
start = time.perf_counter()
binary_search(arr, target)
binary_time = time.perf_counter() - start
print(f"n={n:>10,}: linear={linear_time:.6f}s binary={binary_time:.6f}s")
Run this and observe: linear search time grows proportionally with n, binary search time barely changes. That is the difference between O(n) and O(log n) made visible.
# Demonstrate amortized analysis: dynamic array resizing
import sys
data = []
prev_size = sys.getsizeof(data)
print(f"{'Length':>8} {'Size (bytes)':>14} {'Resized':>8}")
for i in range(64):
data.append(i)
curr_size = sys.getsizeof(data)
resized = " <--" if curr_size != prev_size else ""
print(f"{len(data):>8} {curr_size:>14}{resized}")
prev_size = curr_size
Connection
Analysis gives you a language for discussing efficiency before you write a single line of code. When you see a nested loop over n elements, you know it is O(n^2) and can ask whether a hash table would reduce it to O(n). When you see a recursive function, you can write its recurrence relation and apply the Master Theorem. When someone claims an operation is O(1), you know to ask whether that is worst-case or amortized. Profiling confirms the theory — or reveals that constant factors and cache behavior matter in practice. Both tools are essential.
Try It: Write a function that computes the n-th Fibonacci number recursively, then iteratively. Profile both. The recursive version is O(2^n); the iterative version is O(n). At n = 40, the difference is dramatic.
B. Fundamental Data Structures
Theory
A data structure is a way of organizing data that determines which operations are fast and which are slow. Choosing the right data structure is choosing the right tradeoffs.
Arrays
An array stores elements in contiguous memory locations. Each element is accessed by its index in O(1) time because the memory address is computed by: base_address + index * element_size.
| Operation | Time Complexity |
|---|---|
| Access by index | O(1) |
| Search (unsorted) | O(n) |
| Insert at end | O(1) amortized |
| Insert at position i | O(n) |
| Delete at position i | O(n) |
Python's list is a dynamic array. It stores an array of pointers to objects, with an underlying C array that is resized when capacity is exceeded.
Linked Lists
A linked list stores elements in nodes, where each node contains a value and a pointer to the next node. There is no contiguous memory requirement, so insertion and deletion at a known position are O(1) — but finding that position requires O(n) traversal.
| Operation | Singly Linked | Doubly Linked |
|---|---|---|
| Access by index | O(n) | O(n) |
| Insert at head | O(1) | O(1) |
| Insert at tail | O(n) or O(1) with tail ptr | O(1) |
| Delete at head | O(1) | O(1) |
| Search | O(n) | O(n) |
class Node:
def __init__(self, val, next_node=None):
self.val = val
self.next = next_node
class LinkedList:
def __init__(self):
self.head = None
def prepend(self, val):
self.head = Node(val, self.head)
def append(self, val):
if not self.head:
self.head = Node(val)
return
curr = self.head
while curr.next:
curr = curr.next
curr.next = Node(val)
def delete(self, val):
if not self.head:
return
if self.head.val == val:
self.head = self.head.next
return
curr = self.head
while curr.next:
if curr.next.val == val:
curr.next = curr.next.next
return
curr = curr.next
def __iter__(self):
curr = self.head
while curr:
yield curr.val
curr = curr.next
def __repr__(self):
return " -> ".join(str(v) for v in self) + " -> None"
Stacks and Queues
A stack is last-in, first-out (LIFO). Push and pop operate on the same end. Function call stacks, undo history, and depth-first search all use stacks.
A queue is first-in, first-out (FIFO). Enqueue adds to the back; dequeue removes from the front. Task scheduling, breadth-first search, and message buffers all use queues.
# Stack using a Python list
class Stack:
def __init__(self):
self._data = []
def push(self, val):
self._data.append(val)
def pop(self):
if not self._data:
raise IndexError("pop from empty stack")
return self._data.pop()
def peek(self):
if not self._data:
raise IndexError("peek at empty stack")
return self._data[-1]
def is_empty(self):
return len(self._data) == 0
def __len__(self):
return len(self._data)
# Queue using collections.deque for O(1) operations at both ends
from collections import deque
class Queue:
def __init__(self):
self._data = deque()
def enqueue(self, val):
self._data.append(val)
def dequeue(self):
if not self._data:
raise IndexError("dequeue from empty queue")
return self._data.popleft()
def peek(self):
if not self._data:
raise IndexError("peek at empty queue")
return self._data[0]
def is_empty(self):
return len(self._data) == 0
def __len__(self):
return len(self._data)
Hash Tables
A hash table maps keys to values using a hash function that converts keys into array indices. The average-case complexity for insert, lookup, and delete is O(1). The worst case is O(n) when all keys hash to the same index (a collision chain).
How hashing works: A hash function takes a key and produces an integer. That integer is reduced to an array index using modulo: index = hash(key) % table_size. A good hash function distributes keys uniformly across the table.
Collision resolution uses one of two strategies:
- Chaining: Each table slot holds a linked list of entries that hashed to that index.
- Open addressing: On collision, probe other slots (linear probing, quadratic probing, or double hashing) until an empty one is found.
Math: The Birthday Problem and Hash Collisions
The birthday problem asks: in a group of n people, what is the probability that at least two share a birthday? With 365 possible birthdays, the probability exceeds 50% at just n = 23. This is directly relevant to hashing.
P(no collision among n items in m slots) = product from i=0 to n-1 of (m - i) / m
For n = 23 and m = 365: P(no collision) = (365/365) * (364/365) * ... * (343/365) ≈ 0.493
So P(at least one collision) ≈ 0.507.
This means that with a hash table of m slots, collisions start appearing much sooner than you might expect. The load factor alpha = n/m controls how dense the table is. Most implementations resize when alpha exceeds 0.7 to keep collision chains short.
class HashTable:
def __init__(self, capacity=8):
self._capacity = capacity
self._size = 0
self._buckets = [[] for _ in range(capacity)]
self._load_factor_threshold = 0.7
def _hash(self, key):
return hash(key) % self._capacity
def _resize(self):
old_buckets = self._buckets
self._capacity *= 2
self._buckets = [[] for _ in range(self._capacity)]
self._size = 0
for bucket in old_buckets:
for key, val in bucket:
self.put(key, val)
def put(self, key, val):
if self._size / self._capacity >= self._load_factor_threshold:
self._resize()
idx = self._hash(key)
for i, (k, v) in enumerate(self._buckets[idx]):
if k == key:
self._buckets[idx][i] = (key, val)
return
self._buckets[idx].append((key, val))
self._size += 1
def get(self, key):
idx = self._hash(key)
for k, v in self._buckets[idx]:
if k == key:
return v
raise KeyError(key)
def delete(self, key):
idx = self._hash(key)
for i, (k, v) in enumerate(self._buckets[idx]):
if k == key:
self._buckets[idx].pop(i)
self._size -= 1
return
raise KeyError(key)
def __len__(self):
return self._size
Practice
Build each data structure from scratch and verify its behavior.
# Verify linked list operations
ll = LinkedList()
for v in [10, 20, 30, 40]:
ll.append(v)
print(ll) # 10 -> 20 -> 30 -> 40 -> None
ll.delete(20)
print(ll) # 10 -> 30 -> 40 -> None
ll.prepend(5)
print(ll) # 5 -> 10 -> 30 -> 40 -> None
# Verify stack behavior
s = Stack()
for v in [1, 2, 3]:
s.push(v)
print(s.pop()) # 3 (LIFO)
print(s.pop()) # 2
print(s.peek()) # 1
# Verify queue behavior
q = Queue()
for v in [1, 2, 3]:
q.enqueue(v)
print(q.dequeue()) # 1 (FIFO)
print(q.dequeue()) # 2
print(q.peek()) # 3
# Verify hash table
ht = HashTable()
ht.put("name", "Alice")
ht.put("age", 30)
print(ht.get("name")) # Alice
ht.put("name", "Bob") # update
print(ht.get("name")) # Bob
ht.delete("age")
print(len(ht)) # 1
Connection
Every higher-level system you will encounter is built on these structures. Python's dict is a hash table. The Linux kernel's task scheduler uses linked lists and trees. Network packet buffers are queues. The function call stack in every programming language is a stack. When you understand these primitives, you can reason about the performance of any system built on top of them. The hash table in particular is the most important data structure in practical computing — it turns O(n) searches into O(1) lookups and appears everywhere from DNS caches to database indexes to Python's own object model.
Try It: Implement a hash table with open addressing (linear probing) instead of chaining. Compare the performance of both implementations by inserting 10,000 random key-value pairs and measuring lookup time. Observe how performance degrades as the load factor approaches 1.0.
C. Trees
Theory
A tree is a hierarchical data structure consisting of nodes connected by edges. Each node has zero or more children, and exactly one node (the root) has no parent. Trees are everywhere: file systems, HTML documents, database indexes, compilers (abstract syntax trees), and decision processes.
Binary Trees
A binary tree restricts each node to at most two children: left and right. Traversal orders define how you visit every node:
- In-order (left, root, right): For BSTs, this produces sorted output.
- Pre-order (root, left, right): Useful for serialization and copying trees.
- Post-order (left, right, root): Useful for deletion and expression evaluation.
- Level-order (breadth-first): Visits nodes level by level using a queue.
class TreeNode:
def __init__(self, val, left=None, right=None):
self.val = val
self.left = left
self.right = right
def inorder(node):
if node:
yield from inorder(node.left)
yield node.val
yield from inorder(node.right)
def preorder(node):
if node:
yield node.val
yield from preorder(node.left)
yield from preorder(node.right)
def postorder(node):
if node:
yield from postorder(node.left)
yield from postorder(node.right)
yield node.val
def levelorder(root):
if not root:
return
from collections import deque
q = deque([root])
while q:
node = q.popleft()
yield node.val
if node.left:
q.append(node.left)
if node.right:
q.append(node.right)
Binary Search Trees (BSTs)
A BST enforces a property: for every node, all values in its left subtree are less, and all values in its right subtree are greater. This property enables O(log n) search, insert, and delete — if the tree is balanced.
The problem: if you insert sorted data into a BST, it degenerates into a linked list, and all operations become O(n).
class BST:
def __init__(self):
self.root = None
def insert(self, val):
self.root = self._insert(self.root, val)
def _insert(self, node, val):
if not node:
return TreeNode(val)
if val < node.val:
node.left = self._insert(node.left, val)
elif val > node.val:
node.right = self._insert(node.right, val)
return node
def search(self, val):
return self._search(self.root, val)
def _search(self, node, val):
if not node:
return False
if val == node.val:
return True
elif val < node.val:
return self._search(node.left, val)
else:
return self._search(node.right, val)
def inorder(self):
return list(inorder(self.root))
Self-Balancing Trees
Self-balancing trees automatically restructure themselves after insertions and deletions to maintain O(log n) height.
AVL Trees maintain a balance factor (height of left subtree minus height of right subtree) between -1 and +1 for every node. When an operation violates this, the tree performs rotations to restore balance.
There are four rotation cases:
| Case | Imbalance | Fix |
|---|---|---|
| Left-Left (LL) | Left child is left-heavy | Single right rotation |
| Right-Right (RR) | Right child is right-heavy | Single left rotation |
| Left-Right (LR) | Left child is right-heavy | Left rotate child, then right rotate node |
| Right-Left (RL) | Right child is left-heavy | Right rotate child, then left rotate node |
Red-Black Trees use a coloring invariant instead of height tracking. Every node is either red or black, with five rules that guarantee the tree's height never exceeds 2 * log_2(n+1). Red-black trees are used in Linux's Completely Fair Scheduler, Java's TreeMap, and C++'s std::map.
Red-black rules:
- Every node is red or black.
- The root is black.
- Every null leaf is black.
- A red node cannot have a red child.
- Every path from a node to its descendant null leaves contains the same number of black nodes.
Comparison:
| Property | AVL Tree | Red-Black Tree |
|---|---|---|
| Balancing | Stricter (height diff <= 1) | Looser (no double-red, black-height) |
| Search | Slightly faster (shorter height) | Slightly slower |
| Insert/Delete | More rotations | Fewer rotations |
| Use case | Read-heavy workloads | Write-heavy workloads |
B-Trees and B+ Trees
B-trees are designed for systems that read and write large blocks of data — particularly disk-based storage. A B-tree of order m allows each node to have up to m children and m-1 keys. All leaves are at the same depth, and nodes are kept at least half full.
B+ trees are a variant where all values are stored in leaf nodes, and leaf nodes are linked together for efficient range queries. This is the index structure used by virtually every relational database (PostgreSQL, MySQL, SQLite) and file system (NTFS, ext4, HFS+).
| Property | B-Tree | B+ Tree |
|---|---|---|
| Values stored in | All nodes | Leaf nodes only |
| Leaf linking | No | Yes (linked list) |
| Range queries | Slower | Fast (follow leaf links) |
| Use case | General | Database indexes, file systems |
Why B-trees matter for databases: A B+ tree with branching factor 100 and height 4 can index 100 million rows. Finding any row requires at most 4 disk reads. This is why database indexes make queries fast — they replace a linear scan with a tree traversal that touches a tiny fraction of the data.
Heaps
A heap is a complete binary tree that satisfies the heap property. In a min-heap, every parent is less than or equal to its children. In a max-heap, every parent is greater than or equal to its children. Heaps are used for priority queues and heap sort.
The key insight: a heap can be stored in a flat array. For a node at index i, its left child is at 2i + 1, its right child is at 2i + 2, and its parent is at (i - 1) // 2. No pointers needed.
class MinHeap:
def __init__(self):
self._data = []
def _sift_up(self, i):
while i > 0:
parent = (i - 1) // 2
if self._data[i] < self._data[parent]:
self._data[i], self._data[parent] = self._data[parent], self._data[i]
i = parent
else:
break
def _sift_down(self, i):
n = len(self._data)
while True:
smallest = i
left = 2 * i + 1
right = 2 * i + 2
if left < n and self._data[left] < self._data[smallest]:
smallest = left
if right < n and self._data[right] < self._data[smallest]:
smallest = right
if smallest != i:
self._data[i], self._data[smallest] = self._data[smallest], self._data[i]
i = smallest
else:
break
def push(self, val):
self._data.append(val)
self._sift_up(len(self._data) - 1)
def pop(self):
if not self._data:
raise IndexError("pop from empty heap")
val = self._data[0]
last = self._data.pop()
if self._data:
self._data[0] = last
self._sift_down(0)
return val
def peek(self):
if not self._data:
raise IndexError("peek at empty heap")
return self._data[0]
def __len__(self):
return len(self._data)
| Heap Operation | Time Complexity |
|---|---|
| Insert (push) | O(log n) |
| Extract min/max (pop) | O(log n) |
| Peek | O(1) |
| Build heap from array | O(n) |
The O(n) build-heap time is not obvious. Each sift-down costs O(h) where h is the height of the subtree. Since most nodes are near the bottom (where h is small), the total work sums to O(n), not O(n log n).
Practice
# Build a BST and verify the sorted output
bst = BST()
import random
values = random.sample(range(100), 15)
for v in values:
bst.insert(v)
print("Inserted:", values)
print("In-order:", bst.inorder()) # sorted
# Use Python's heapq module (min-heap) for a priority queue
import heapq
tasks = [(3, "low priority"), (1, "urgent"), (2, "medium")]
heapq.heapify(tasks)
while tasks:
priority, name = heapq.heappop(tasks)
print(f"Processing: {name} (priority {priority})")
# Build a min-heap from scratch and extract elements in sorted order
heap = MinHeap()
for v in [5, 3, 8, 1, 9, 2, 7]:
heap.push(v)
sorted_output = []
while len(heap) > 0:
sorted_output.append(heap.pop())
print("Heap sort:", sorted_output) # [1, 2, 3, 5, 7, 8, 9]
Connection
Trees are the bridge between linear data structures and the complex hierarchical systems you will encounter in every later domain. File systems are trees. DNS is a tree. HTML is a tree. When you study database indexing in Data Management, you will understand why B+ trees make range queries fast — because you built a heap and a BST here. When you study the Linux kernel's process scheduler, you will know why it uses a red-black tree — because you learned the tradeoff between AVL and red-black balancing here. The heap is the foundation of every priority queue, which appears in network packet scheduling, job scheduling, and the shortest-path algorithms you will learn in the next section.
Try It: Insert the values 1 through 10 in order into a plain BST and print its height. Then insert the same values in random order and print the height again. This demonstrates why self-balancing trees exist: sequential inserts create a degenerate tree with height n, while random inserts produce a tree with height approximately log n.
D. Graphs
Theory
A graph is a set of vertices (nodes) connected by edges. Graphs model networks, dependencies, relationships, and state transitions. Unlike trees, graphs can have cycles and multiple paths between vertices.
Graph Fundamentals
| Term | Definition |
|---|---|
| Directed graph (digraph) | Edges have direction: (u, v) is not the same as (v, u) |
| Undirected graph | Edges have no direction: {u, v} = {v, u} |
| Weighted graph | Each edge has a numeric weight (cost, distance, capacity) |
| Adjacent | Two vertices connected by an edge |
| Degree | Number of edges incident to a vertex |
| Path | A sequence of vertices where each consecutive pair is connected by an edge |
| Cycle | A path that starts and ends at the same vertex |
| Connected | Every vertex is reachable from every other vertex (undirected) |
| DAG | Directed Acyclic Graph — a directed graph with no cycles |
Representation: Two standard ways to store a graph:
| Representation | Space | Edge lookup | Iterate neighbors | Best for |
|---|---|---|---|---|
| Adjacency matrix | O(V^2) | O(1) | O(V) | Dense graphs |
| Adjacency list | O(V + E) | O(degree) | O(degree) | Sparse graphs |
Most real-world graphs are sparse (E is much less than V^2), so adjacency lists are the default choice.
from collections import defaultdict, deque
class Graph:
def __init__(self, directed=False):
self.adj = defaultdict(list)
self.directed = directed
def add_edge(self, u, v, weight=1):
self.adj[u].append((v, weight))
if not self.directed:
self.adj[v].append((u, weight))
def vertices(self):
return set(self.adj.keys())
Breadth-First Search (BFS)
BFS explores all vertices at distance d before any vertex at distance d+1. It uses a queue and produces shortest paths in unweighted graphs.
def bfs(graph, start):
visited = {start}
queue = deque([start])
order = []
while queue:
vertex = queue.popleft()
order.append(vertex)
for neighbor, _ in graph.adj[vertex]:
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor)
return order
def bfs_shortest_path(graph, start, end):
visited = {start}
queue = deque([(start, [start])])
while queue:
vertex, path = queue.popleft()
if vertex == end:
return path
for neighbor, _ in graph.adj[vertex]:
if neighbor not in visited:
visited.add(neighbor)
queue.append((neighbor, path + [neighbor]))
return None # no path exists
Time complexity: O(V + E). Every vertex and edge is visited exactly once.
Depth-First Search (DFS)
DFS explores as far as possible along a branch before backtracking. It uses a stack (or recursion, which uses the call stack) and is the basis for topological sorting, cycle detection, and finding connected components.
def dfs(graph, start):
visited = set()
order = []
def visit(v):
visited.add(v)
order.append(v)
for neighbor, _ in graph.adj[v]:
if neighbor not in visited:
visit(neighbor)
visit(start)
return order
def dfs_iterative(graph, start):
visited = set()
stack = [start]
order = []
while stack:
vertex = stack.pop()
if vertex in visited:
continue
visited.add(vertex)
order.append(vertex)
for neighbor, _ in graph.adj[vertex]:
if neighbor not in visited:
stack.append(neighbor)
return order
Time complexity: O(V + E), same as BFS.
Topological Sort
A topological ordering of a DAG is a linear ordering of vertices such that for every directed edge (u, v), u comes before v. This is essential for dependency resolution — build systems, package managers, course prerequisites, and task scheduling all use topological sort.
def topological_sort(graph):
visited = set()
stack = []
def visit(v):
visited.add(v)
for neighbor, _ in graph.adj[v]:
if neighbor not in visited:
visit(neighbor)
stack.append(v) # add after all descendants
for v in graph.adj:
if v not in visited:
visit(v)
return stack[::-1] # reverse post-order
Kahn's algorithm (BFS-based) provides an alternative that also detects cycles:
def topological_sort_kahn(graph):
in_degree = defaultdict(int)
for u in graph.adj:
for v, _ in graph.adj[u]:
in_degree[v] += 1
queue = deque([v for v in graph.adj if in_degree[v] == 0])
order = []
while queue:
vertex = queue.popleft()
order.append(vertex)
for neighbor, _ in graph.adj[vertex]:
in_degree[neighbor] -= 1
if in_degree[neighbor] == 0:
queue.append(neighbor)
if len(order) != len(graph.adj):
raise ValueError("Graph has a cycle — no topological ordering exists")
return order
Shortest Path Algorithms
Dijkstra's Algorithm finds the shortest path from a source to all other vertices in a graph with non-negative weights. It uses a priority queue (min-heap) to always process the closest unvisited vertex.
import heapq
def dijkstra(graph, start):
dist = {start: 0}
prev = {}
pq = [(0, start)]
while pq:
d, u = heapq.heappop(pq)
if d > dist.get(u, float('inf')):
continue
for v, weight in graph.adj[u]:
new_dist = d + weight
if new_dist < dist.get(v, float('inf')):
dist[v] = new_dist
prev[v] = u
heapq.heappush(pq, (new_dist, v))
return dist, prev
def reconstruct_path(prev, start, end):
path = []
current = end
while current != start:
if current not in prev:
return None # no path
path.append(current)
current = prev[current]
path.append(start)
return path[::-1]
Time complexity: O((V + E) log V) with a binary heap.
Bellman-Ford Algorithm handles negative edge weights (which Dijkstra's cannot). It relaxes all edges V-1 times and can detect negative cycles.
def bellman_ford(graph, start):
dist = {v: float('inf') for v in graph.adj}
dist[start] = 0
prev = {}
vertices = list(graph.adj.keys())
edges = [(u, v, w) for u in graph.adj for v, w in graph.adj[u]]
for _ in range(len(vertices) - 1):
for u, v, w in edges:
if dist[u] + w < dist[v]:
dist[v] = dist[u] + w
prev[v] = u
# Check for negative cycles
for u, v, w in edges:
if dist[u] + w < dist[v]:
raise ValueError("Graph contains a negative cycle")
return dist, prev
Time complexity: O(V * E).
Floyd-Warshall Algorithm finds shortest paths between all pairs of vertices. It uses dynamic programming with a 2D distance matrix.
def floyd_warshall(vertices, edges):
dist = {v: {u: float('inf') for u in vertices} for v in vertices}
next_hop = {v: {u: None for u in vertices} for v in vertices}
for v in vertices:
dist[v][v] = 0
for u, v, w in edges:
dist[u][v] = w
next_hop[u][v] = v
for k in vertices:
for i in vertices:
for j in vertices:
if dist[i][k] + dist[k][j] < dist[i][j]:
dist[i][j] = dist[i][k] + dist[k][j]
next_hop[i][j] = next_hop[i][k]
return dist, next_hop
Time complexity: O(V^3). Use it for dense graphs or when you need all-pairs shortest paths.
| Algorithm | Negative Weights | All Pairs | Time Complexity |
|---|---|---|---|
| Dijkstra's | No | No | O((V+E) log V) |
| Bellman-Ford | Yes | No | O(V * E) |
| Floyd-Warshall | Yes | Yes | O(V^3) |
Minimum Spanning Trees
A minimum spanning tree (MST) of a connected, weighted, undirected graph is a tree that includes all vertices with the minimum total edge weight. MSTs are used in network design, clustering, and approximation algorithms.
Prim's Algorithm grows the MST from a starting vertex, always adding the cheapest edge that connects a new vertex:
def prim(graph, start):
mst_edges = []
visited = {start}
edges = [(w, start, v) for v, w in graph.adj[start]]
heapq.heapify(edges)
total_weight = 0
while edges and len(visited) < len(graph.adj):
w, u, v = heapq.heappop(edges)
if v in visited:
continue
visited.add(v)
mst_edges.append((u, v, w))
total_weight += w
for neighbor, weight in graph.adj[v]:
if neighbor not in visited:
heapq.heappush(edges, (weight, v, neighbor))
return mst_edges, total_weight
Kruskal's Algorithm sorts all edges by weight and adds them one by one, skipping edges that would create a cycle. Cycle detection uses a Union-Find (disjoint set) data structure:
class UnionFind:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x]) # path compression
return self.parent[x]
def union(self, x, y):
rx, ry = self.find(x), self.find(y)
if rx == ry:
return False
if self.rank[rx] < self.rank[ry]:
rx, ry = ry, rx
self.parent[ry] = rx
if self.rank[rx] == self.rank[ry]:
self.rank[rx] += 1
return True
def kruskal(vertices, edges):
"""vertices: list of ints 0..n-1, edges: list of (u, v, weight)"""
uf = UnionFind(len(vertices))
edges_sorted = sorted(edges, key=lambda e: e[2])
mst = []
total = 0
for u, v, w in edges_sorted:
if uf.union(u, v):
mst.append((u, v, w))
total += w
return mst, total
Graph Coloring
Graph coloring assigns colors to vertices such that no two adjacent vertices share the same color. The minimum number of colors needed is the chromatic number. Graph coloring is used in register allocation (compilers), scheduling (exams, frequencies), and map coloring.
Finding the chromatic number is NP-hard in general, but a greedy algorithm provides a reasonable heuristic:
def greedy_coloring(graph):
color = {}
for vertex in graph.adj:
neighbor_colors = {color[n] for n, _ in graph.adj[vertex] if n in color}
c = 0
while c in neighbor_colors:
c += 1
color[vertex] = c
return color
Practice
# Build a weighted graph and run shortest path algorithms
g = Graph(directed=False)
edges = [
("A", "B", 4), ("A", "C", 2), ("B", "C", 1),
("B", "D", 5), ("C", "D", 8), ("C", "E", 10),
("D", "E", 2), ("D", "F", 6), ("E", "F", 3)
]
for u, v, w in edges:
g.add_edge(u, v, w)
# BFS (unweighted traversal)
print("BFS from A:", bfs(g, "A"))
# Dijkstra's shortest path
dist, prev = dijkstra(g, "A")
print("Shortest distances from A:", dict(dist))
print("Shortest path A->F:", reconstruct_path(prev, "A", "F"))
# Topological sort on a DAG
dag = Graph(directed=True)
dag.add_edge("foundations", "linux")
dag.add_edge("linux", "networking")
dag.add_edge("linux", "scripting")
dag.add_edge("scripting", "python")
dag.add_edge("python", "algorithms")
dag.add_edge("networking", "security")
print("Topological order:", topological_sort(dag))
Connection
Graphs are the most general data structure. Every network is a graph — the internet, social networks, road maps, circuit boards, dependency trees, and state machines. When you study networking in Networking, routing protocols use Dijkstra's algorithm (OSPF) and Bellman-Ford (RIP). When you study container orchestration in Infrastructure at Scale, Kubernetes scheduling is a graph problem. When you study CI/CD in Software Engineering and Collaboration, build pipelines are DAGs that require topological sorting. Understanding graph algorithms gives you the vocabulary and tools to reason about any system of interconnected components.
Try It: Model a small network of 6 cities with distances between them. Use Dijkstra's algorithm to find the shortest route between the two farthest cities. Then use Prim's or Kruskal's algorithm to find the minimum spanning tree — the cheapest way to connect all cities. Compare the total MST weight to the shortest path. They solve different problems: MST minimizes total infrastructure cost, shortest path minimizes travel between two points.
E. Sorting and Searching
Theory
Sorting is the most studied problem in computer science. It appears everywhere — displaying results, preparing data for binary search, detecting duplicates, and enabling efficient merging. Understanding sorting algorithms teaches you about algorithmic design, tradeoffs, and lower bounds.
Comparison-Based Sorting
Comparison sorts determine the order of elements by comparing pairs. Every comparison sort must make at least O(n log n) comparisons in the worst case.
Bubble Sort repeatedly swaps adjacent elements that are out of order. Simple but slow.
def bubble_sort(arr):
n = len(arr)
for i in range(n):
swapped = False
for j in range(n - 1 - i):
if arr[j] > arr[j + 1]:
arr[j], arr[j + 1] = arr[j + 1], arr[j]
swapped = True
if not swapped:
break # already sorted
return arr
Insertion Sort builds a sorted portion one element at a time. Efficient for small or nearly sorted arrays. Used as the base case in hybrid sorts like Timsort.
def insertion_sort(arr):
for i in range(1, len(arr)):
key = arr[i]
j = i - 1
while j >= 0 and arr[j] > key:
arr[j + 1] = arr[j]
j -= 1
arr[j + 1] = key
return arr
Merge Sort divides the array in half, recursively sorts each half, and merges the sorted halves. Guaranteed O(n log n) time but requires O(n) extra space.
def merge_sort(arr):
if len(arr) <= 1:
return arr
mid = len(arr) // 2
left = merge_sort(arr[:mid])
right = merge_sort(arr[mid:])
return merge(left, right)
def merge(left, right):
result = []
i = j = 0
while i < len(left) and j < len(right):
if left[i] <= right[j]:
result.append(left[i])
i += 1
else:
result.append(right[j])
j += 1
result.extend(left[i:])
result.extend(right[j:])
return result
Quick Sort picks a pivot, partitions elements into those less than and greater than the pivot, and recursively sorts each partition. Average O(n log n), worst case O(n^2) with poor pivot selection. In practice, it is often the fastest comparison sort due to excellent cache behavior.
def quicksort(arr):
if len(arr) <= 1:
return arr
pivot = arr[len(arr) // 2]
left = [x for x in arr if x < pivot]
middle = [x for x in arr if x == pivot]
right = [x for x in arr if x > pivot]
return quicksort(left) + middle + quicksort(right)
# In-place partition (Lomuto scheme) for production use
def quicksort_inplace(arr, lo=0, hi=None):
if hi is None:
hi = len(arr) - 1
if lo < hi:
pivot_idx = partition(arr, lo, hi)
quicksort_inplace(arr, lo, pivot_idx - 1)
quicksort_inplace(arr, pivot_idx + 1, hi)
return arr
def partition(arr, lo, hi):
pivot = arr[hi]
i = lo
for j in range(lo, hi):
if arr[j] < pivot:
arr[i], arr[j] = arr[j], arr[i]
i += 1
arr[i], arr[hi] = arr[hi], arr[i]
return i
Heap Sort builds a max-heap and repeatedly extracts the maximum. Guaranteed O(n log n) time and O(1) extra space, but poor cache performance compared to quicksort.
def heapsort(arr):
n = len(arr)
def sift_down(start, end):
root = start
while True:
child = 2 * root + 1
if child > end:
break
if child + 1 <= end and arr[child] < arr[child + 1]:
child += 1
if arr[root] < arr[child]:
arr[root], arr[child] = arr[child], arr[root]
root = child
else:
break
# Build max-heap
for i in range(n // 2 - 1, -1, -1):
sift_down(i, n - 1)
# Extract elements
for end in range(n - 1, 0, -1):
arr[0], arr[end] = arr[end], arr[0]
sift_down(0, end - 1)
return arr
Comparison of sorting algorithms:
| Algorithm | Best | Average | Worst | Space | Stable |
|---|---|---|---|---|---|
| Bubble Sort | O(n) | O(n^2) | O(n^2) | O(1) | Yes |
| Insertion Sort | O(n) | O(n^2) | O(n^2) | O(1) | Yes |
| Merge Sort | O(n log n) | O(n log n) | O(n log n) | O(n) | Yes |
| Quick Sort | O(n log n) | O(n log n) | O(n^2) | O(log n) | No |
| Heap Sort | O(n log n) | O(n log n) | O(n log n) | O(1) | No |
A stable sort preserves the relative order of elements with equal keys. This matters when sorting by multiple criteria — sort by last name, then stable-sort by first name, and records with the same first name remain sorted by last name.
Math: The O(n log n) Lower Bound for Comparison Sorting
Any comparison-based sorting algorithm must make at least O(n log n) comparisons in the worst case. The proof uses a decision tree argument.
A comparison sort can be modeled as a binary decision tree where each internal node is a comparison (is a_i < a_j?) and each leaf represents a possible output permutation. There are n! possible permutations of n elements, so the tree must have at least n! leaves. A binary tree with L leaves has height at least log_2(L).
Height >= log_2(n!) = log_2(1) + log_2(2) + ... + log_2(n)
By Stirling's approximation, log_2(n!) is in Theta(n log n). Therefore, no comparison sort can do better than O(n log n) in the worst case.
Non-Comparison Sorting
Non-comparison sorts bypass the O(n log n) lower bound by exploiting the structure of the keys.
Counting Sort works for integers in a known range [0, k). It counts occurrences of each value and uses the counts to place elements directly. Time: O(n + k). Space: O(k).
def counting_sort(arr, max_val=None):
if not arr:
return arr
if max_val is None:
max_val = max(arr)
count = [0] * (max_val + 1)
for x in arr:
count[x] += 1
result = []
for val, cnt in enumerate(count):
result.extend([val] * cnt)
return result
Radix Sort sorts integers digit by digit, from least significant to most significant, using a stable sort (typically counting sort) at each digit position. Time: O(d * (n + k)) where d is the number of digits and k is the base.
def radix_sort(arr):
if not arr:
return arr
max_val = max(arr)
exp = 1
while max_val // exp > 0:
arr = counting_sort_by_digit(arr, exp)
exp *= 10
return arr
def counting_sort_by_digit(arr, exp):
n = len(arr)
output = [0] * n
count = [0] * 10
for x in arr:
digit = (x // exp) % 10
count[digit] += 1
for i in range(1, 10):
count[i] += count[i - 1]
for x in reversed(arr):
digit = (x // exp) % 10
count[digit] -= 1
output[count[digit]] = x
return output
Binary Search
Binary search finds a target in a sorted array in O(log n) time by repeatedly halving the search space. It is the most important search algorithm and appears in countless variations.
def binary_search(arr, target):
lo, hi = 0, len(arr) - 1
while lo <= hi:
mid = (lo + hi) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
lo = mid + 1
else:
hi = mid - 1
return -1 # not found
def bisect_left(arr, target):
"""Find the leftmost position where target can be inserted to keep arr sorted."""
lo, hi = 0, len(arr)
while lo < hi:
mid = (lo + hi) // 2
if arr[mid] < target:
lo = mid + 1
else:
hi = mid
return lo
def bisect_right(arr, target):
"""Find the rightmost position where target can be inserted to keep arr sorted."""
lo, hi = 0, len(arr)
while lo < hi:
mid = (lo + hi) // 2
if arr[mid] <= target:
lo = mid + 1
else:
hi = mid
return lo
Binary search applies beyond arrays. Any monotonic function can be binary-searched. Need to find the smallest x where f(x) >= threshold? Binary search on x.
Quickselect
Quickselect finds the k-th smallest element without fully sorting the array. It uses the same partition function as quicksort but only recurses into the partition containing the target index. Average time: O(n). Worst case: O(n^2).
def quickselect(arr, k):
"""Find the k-th smallest element (0-indexed)."""
if len(arr) == 1:
return arr[0]
pivot = arr[len(arr) // 2]
left = [x for x in arr if x < pivot]
middle = [x for x in arr if x == pivot]
right = [x for x in arr if x > pivot]
if k < len(left):
return quickselect(left, k)
elif k < len(left) + len(middle):
return pivot
else:
return quickselect(right, k - len(left) - len(middle))
Python's statistics.median() and heapq.nsmallest() use selection algorithms internally.
Practice
import random
import time
# Compare sorting algorithms on the same data
def benchmark_sort(sort_fn, data, name):
arr = data.copy()
start = time.perf_counter()
sort_fn(arr)
elapsed = time.perf_counter() - start
return elapsed
sizes = [1_000, 5_000, 10_000]
for n in sizes:
data = [random.randint(0, n) for _ in range(n)]
print(f"\nn = {n}:")
for name, fn in [
("Insertion Sort", insertion_sort),
("Merge Sort", merge_sort),
("Quick Sort", quicksort),
("Heap Sort", heapsort),
("Python sorted()", sorted),
]:
t = benchmark_sort(fn, data, name)
print(f" {name:20s}: {t:.4f}s")
# Demonstrate stability
records = [("Alice", 85), ("Bob", 92), ("Charlie", 85), ("Dave", 92)]
# Stable sort by score preserves alphabetical order within same score
stable_sorted = sorted(records, key=lambda r: r[1])
print("\nStable sort by score:", stable_sorted)
# Binary search: find insertion point
import bisect
arr = [10, 20, 30, 40, 50]
print(f"\nInsert 35 at index: {bisect.bisect_left(arr, 35)}") # 3
print(f"Insert 30 at index (left): {bisect.bisect_left(arr, 30)}") # 2
print(f"Insert 30 at index (right): {bisect.bisect_right(arr, 30)}") # 3
Connection
Sorting and searching are not just algorithms — they are building blocks. Database query engines sort for ORDER BY, GROUP BY, and merge joins. Binary search powers database index lookups, debugging (git bisect), and any system that narrows down possibilities by halving. When you see Python's sorted() or .sort(), you are using Timsort — a hybrid of merge sort and insertion sort designed for real-world data with existing runs of sorted elements. Understanding these algorithms lets you predict when a system will be fast and when it will not. If you know a dataset is nearly sorted, you choose insertion sort. If you need guaranteed O(n log n) with stability, you choose merge sort. If memory is tight, you choose heapsort. Every choice is a tradeoff, and this section gives you the knowledge to make it.
Try It: Implement a function that uses binary search to find the square root of a number to 6 decimal places. The idea: binary search for x in [0, n] where x * x is closest to n. This demonstrates that binary search works on any monotonic function, not just arrays.
F. Algorithm Design Paradigms
Theory
Algorithm design paradigms are general strategies for solving problems. Recognizing which paradigm fits a problem is often the hardest part of algorithm design. Once you identify the paradigm, the solution structure follows.
Greedy Algorithms
A greedy algorithm makes the locally optimal choice at each step, hoping to reach a globally optimal solution. Greedy algorithms are simple and fast, but they do not always produce optimal results. The key question is: does the greedy choice property hold? That is, can a globally optimal solution be reached by making locally optimal choices?
Classic greedy problems:
- Activity selection: Given activities with start and end times, select the maximum number of non-overlapping activities. Greedy strategy: always pick the activity that finishes earliest.
- Huffman coding: Build an optimal prefix-free encoding for a set of character frequencies. Greedy strategy: repeatedly merge the two lowest-frequency nodes.
- Dijkstra's algorithm: Already covered in section D — it is a greedy algorithm that always processes the nearest unvisited vertex.
def activity_selection(activities):
"""Select maximum non-overlapping activities.
activities: list of (start, end) tuples."""
sorted_acts = sorted(activities, key=lambda a: a[1]) # sort by end time
selected = [sorted_acts[0]]
for start, end in sorted_acts[1:]:
if start >= selected[-1][1]:
selected.append((start, end))
return selected
# Example
activities = [(1, 4), (3, 5), (0, 6), (5, 7), (3, 9), (5, 9), (6, 10), (8, 11), (8, 12), (2, 14), (12, 16)]
print("Selected activities:", activity_selection(activities))
Huffman Coding builds a variable-length encoding where frequent characters get shorter codes. It is used in gzip, JPEG, and MP3 compression.
import heapq
from collections import Counter
def huffman_encoding(text):
freq = Counter(text)
heap = [(count, i, char, None, None) for i, (char, count) in enumerate(freq.items())]
heapq.heapify(heap)
counter = len(heap)
while len(heap) > 1:
lo = heapq.heappop(heap)
hi = heapq.heappop(heap)
merged = (lo[0] + hi[0], counter, None, lo, hi)
counter += 1
heapq.heappush(heap, merged)
codes = {}
def build_codes(node, prefix=""):
if node is None:
return
_, _, char, left, right = node
if char is not None:
codes[char] = prefix or "0"
build_codes(left, prefix + "0")
build_codes(right, prefix + "1")
build_codes(heap[0])
return codes
text = "abracadabra"
codes = huffman_encoding(text)
for char, code in sorted(codes.items()):
print(f" '{char}': {code}")
encoded = "".join(codes[c] for c in text)
print(f"Original: {len(text) * 8} bits, Encoded: {len(encoded)} bits")
Divide and Conquer
Divide and conquer breaks a problem into smaller subproblems of the same type, solves them recursively, and combines the results. Merge sort and quicksort are the canonical examples. The Master Theorem (section A) analyzes the complexity of divide-and-conquer algorithms.
Key characteristics:
- Divide: Split the problem into subproblems.
- Conquer: Solve subproblems recursively. Base cases are solved directly.
- Combine: Merge subproblem solutions into the overall solution.
Counting inversions is a classic divide-and-conquer problem. An inversion is a pair (i, j) where i < j but arr[i] > arr[j]. The count measures how far an array is from sorted. Modified merge sort counts inversions during the merge step.
def count_inversions(arr):
if len(arr) <= 1:
return arr, 0
mid = len(arr) // 2
left, left_inv = count_inversions(arr[:mid])
right, right_inv = count_inversions(arr[mid:])
merged, split_inv = merge_and_count(left, right)
return merged, left_inv + right_inv + split_inv
def merge_and_count(left, right):
result = []
inversions = 0
i = j = 0
while i < len(left) and j < len(right):
if left[i] <= right[j]:
result.append(left[i])
i += 1
else:
result.append(right[j])
inversions += len(left) - i # all remaining left elements are inversions
j += 1
result.extend(left[i:])
result.extend(right[j:])
return result, inversions
arr = [5, 3, 8, 1, 2]
_, inv = count_inversions(arr)
print(f"Inversions in {arr}: {inv}") # 7
Dynamic Programming
Dynamic programming (DP) solves problems by breaking them into overlapping subproblems and storing the results to avoid redundant computation. It applies when a problem has optimal substructure (optimal solution contains optimal solutions to subproblems) and overlapping subproblems (the same subproblems are solved repeatedly).
Two approaches:
- Top-down (memoization): Recursive with a cache.
- Bottom-up (tabulation): Iterative, filling a table from base cases up.
Fibonacci is the simplest DP example:
# Naive recursion: O(2^n) — exponential, recomputes subproblems
def fib_naive(n):
if n <= 1:
return n
return fib_naive(n - 1) + fib_naive(n - 2)
# Top-down with memoization: O(n)
from functools import lru_cache
@lru_cache(maxsize=None)
def fib_memo(n):
if n <= 1:
return n
return fib_memo(n - 1) + fib_memo(n - 2)
# Bottom-up tabulation: O(n) time, O(1) space
def fib_tab(n):
if n <= 1:
return n
a, b = 0, 1
for _ in range(2, n + 1):
a, b = b, a + b
return b
The 0/1 Knapsack Problem: Given n items with weights and values, and a knapsack with capacity W, find the maximum value you can carry.
def knapsack(weights, values, capacity):
n = len(weights)
# dp[i][w] = max value using first i items with capacity w
dp = [[0] * (capacity + 1) for _ in range(n + 1)]
for i in range(1, n + 1):
for w in range(capacity + 1):
dp[i][w] = dp[i - 1][w] # skip item i
if weights[i - 1] <= w:
dp[i][w] = max(dp[i][w],
dp[i - 1][w - weights[i - 1]] + values[i - 1])
# Backtrack to find which items were selected
w = capacity
selected = []
for i in range(n, 0, -1):
if dp[i][w] != dp[i - 1][w]:
selected.append(i - 1)
w -= weights[i - 1]
return dp[n][capacity], selected
weights = [2, 3, 4, 5]
values = [3, 4, 5, 6]
max_val, items = knapsack(weights, values, 8)
print(f"Max value: {max_val}, Items: {items}")
Longest Common Subsequence (LCS): Given two strings, find the length of the longest subsequence common to both. This is the algorithm behind diff and version control.
def lcs(s1, s2):
m, n = len(s1), len(s2)
dp = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(1, m + 1):
for j in range(1, n + 1):
if s1[i - 1] == s2[j - 1]:
dp[i][j] = dp[i - 1][j - 1] + 1
else:
dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
# Backtrack to reconstruct the LCS
result = []
i, j = m, n
while i > 0 and j > 0:
if s1[i - 1] == s2[j - 1]:
result.append(s1[i - 1])
i -= 1
j -= 1
elif dp[i - 1][j] > dp[i][j - 1]:
i -= 1
else:
j -= 1
return dp[m][n], "".join(reversed(result))
length, subseq = lcs("ABCBDAB", "BDCAB")
print(f"LCS length: {length}, LCS: {subseq}")
Backtracking
Backtracking explores all potential solutions by building candidates incrementally and abandoning a candidate as soon as it is determined to be invalid. It is a systematic way to search through a solution space.
The pattern: Make a choice, recurse, undo the choice (backtrack), try the next option.
def permutations(arr):
result = []
def backtrack(start):
if start == len(arr):
result.append(arr[:])
return
for i in range(start, len(arr)):
arr[start], arr[i] = arr[i], arr[start]
backtrack(start + 1)
arr[start], arr[i] = arr[i], arr[start] # undo
backtrack(0)
return result
print("Permutations of [1,2,3]:", permutations([1, 2, 3]))
N-Queens Problem: Place n queens on an n x n chessboard so that no two queens attack each other.
def solve_n_queens(n):
solutions = []
board = [-1] * n # board[row] = column of queen in that row
def is_safe(row, col):
for prev_row in range(row):
prev_col = board[prev_row]
if prev_col == col:
return False
if abs(prev_col - col) == abs(prev_row - row):
return False
return True
def backtrack(row):
if row == n:
solutions.append(board[:])
return
for col in range(n):
if is_safe(row, col):
board[row] = col
backtrack(row + 1)
board[row] = -1 # undo
backtrack(0)
return solutions
solutions = solve_n_queens(8)
print(f"8-Queens has {len(solutions)} solutions")
# Print one solution
def print_board(solution):
n = len(solution)
for row in range(n):
line = ["Q" if col == solution[row] else "." for col in range(n)]
print(" ".join(line))
print("\nOne solution:")
print_board(solutions[0])
Math: Combinatorics in Backtracking
Backtracking searches through combinatorial spaces. Understanding the size of these spaces tells you whether a backtracking solution is feasible.
- Permutations: n! arrangements of n items. 10! = 3,628,800. 20! ≈ 2.4 * 10^18.
- Combinations: C(n, k) = n! / (k! * (n-k)!). C(20, 10) = 184,756.
- Subsets: 2^n subsets of n items. 2^20 ≈ 1,000,000. 2^30 ≈ 1,000,000,000.
A brute-force search over all permutations of 20 items is infeasible. Backtracking with pruning eliminates large branches of the search tree, often making the problem tractable.
Choosing the right paradigm:
| Paradigm | Subproblem Overlap | Optimal Substructure | Explores All | Time |
|---|---|---|---|---|
| Greedy | N/A | Yes (greedy choice) | No | Usually O(n log n) |
| Divide & Conquer | No | Yes | No | O(n log n) typical |
| Dynamic Programming | Yes | Yes | Yes (but caches) | Polynomial |
| Backtracking | N/A | Sometimes | Yes (with pruning) | Exponential worst |
Practice
# Dynamic programming: coin change problem
def coin_change(coins, amount):
"""Find minimum number of coins to make the given amount."""
dp = [float('inf')] * (amount + 1)
dp[0] = 0
for coin in coins:
for a in range(coin, amount + 1):
dp[a] = min(dp[a], dp[a - coin] + 1)
return dp[amount] if dp[amount] != float('inf') else -1
print("Min coins for 11 cents:", coin_change([1, 5, 10, 25], 11)) # 2 (1+10)
print("Min coins for 30 cents:", coin_change([1, 5, 10, 25], 30)) # 2 (5+25)
# Greedy vs DP: coin change where greedy fails
# With coins [1, 3, 4], greedy for amount 6 gives 4+1+1 (3 coins)
# DP gives 3+3 (2 coins)
print("Greedy-unfriendly coins for 6:", coin_change([1, 3, 4], 6)) # 2
# Backtracking: generate all subsets
def subsets(arr):
result = []
def backtrack(start, current):
result.append(current[:])
for i in range(start, len(arr)):
current.append(arr[i])
backtrack(i + 1, current)
current.pop()
backtrack(0, [])
return result
print("Subsets of [1,2,3]:", subsets([1, 2, 3]))
# Divide and conquer: maximum subarray sum (Kadane's is O(n), but D&C illustrates the paradigm)
def max_subarray_dc(arr, lo=0, hi=None):
if hi is None:
hi = len(arr) - 1
if lo == hi:
return arr[lo]
mid = (lo + hi) // 2
left_max = max_subarray_dc(arr, lo, mid)
right_max = max_subarray_dc(arr, mid + 1, hi)
# Max crossing subarray
left_sum = float('-inf')
total = 0
for i in range(mid, lo - 1, -1):
total += arr[i]
left_sum = max(left_sum, total)
right_sum = float('-inf')
total = 0
for i in range(mid + 1, hi + 1):
total += arr[i]
right_sum = max(right_sum, total)
return max(left_max, right_max, left_sum + right_sum)
arr = [-2, 1, -3, 4, -1, 2, 1, -5, 4]
print(f"Max subarray sum: {max_subarray_dc(arr)}") # 6 ([4, -1, 2, 1])
Connection
Design paradigms are not academic curiosities — they are the mental models that experienced engineers use to decompose real problems. When you encounter a scheduling problem in Infrastructure at Scale, you recognize it as a greedy or DP problem. When you need to compute a diff between two configuration files, you know it is LCS — dynamic programming. When a CI/CD pipeline needs to find the optimal deployment strategy, the dependency resolution is topological sorting (greedy on the DAG). The paradigm tells you the structure of the solution before you write any code. Mastering these four paradigms — greedy, divide and conquer, dynamic programming, and backtracking — gives you a toolkit that covers the vast majority of algorithmic problems you will encounter in practice.
Try It: Implement the coin change problem using both a greedy approach and dynamic programming. Test with coins [1, 5, 10, 25] (where greedy works) and coins [1, 3, 4] with amount 6 (where greedy fails but DP succeeds). This demonstrates that greedy algorithms are faster but only correct when the greedy choice property holds.
Exercises
-
Complexity analysis: Determine the Big-O complexity of the following code snippets without running them — a nested loop
for i in range(n): for j in range(i, n), a loop that doublesi = 1; while i < n: i *= 2, and a recursive function withT(n) = 2T(n/2) + n. -
Hash table from scratch: Implement a hash table with open addressing (linear probing). Support insert, lookup, delete, and automatic resizing. Test it with 10,000 random key-value pairs and measure the average lookup time at different load factors (0.3, 0.5, 0.7, 0.9).
-
BST operations: Implement a complete BST with insert, search, delete (with all three cases: leaf, one child, two children), and an
in_ordertraversal. Verify that the in-order traversal always produces sorted output. -
Graph traversal: Build a graph representing a small social network (at least 10 people with friendship edges). Implement BFS to find the shortest path between any two people and DFS to find all connected components.
-
Dijkstra's in practice: Model a network of 8 routers with weighted links representing latency. Find the lowest-latency path from any source to any destination. Compare your result with what you would expect from running
tracerouteon a real network (which you will do in Networking). -
Sorting benchmarks: Implement merge sort, quicksort, and heapsort. Benchmark all three on arrays of size 1,000, 10,000, and 100,000 with random data, sorted data, and reverse-sorted data. Explain the results using Big-O analysis and cache behavior.
-
Dynamic programming: Solve the longest increasing subsequence problem. Given an array, find the length of the longest subsequence where each element is strictly greater than the previous. Implement both the O(n^2) DP solution and the O(n log n) solution using binary search.
-
N-Queens visualization: Solve the N-Queens problem for n = 4 through n = 12. Print the number of solutions for each n and display one solution for n = 8 as a formatted board.
-
Huffman coding: Implement complete Huffman encoding and decoding. Encode a paragraph of text, measure the compression ratio, and verify that decoding produces the original text.
-
Integrated challenge: Build a simple spell checker. Use a hash table to store a dictionary of valid words. For a misspelled word, use edit distance (dynamic programming) to find the closest valid words. This combines hash tables, dynamic programming, and string algorithms.
Assessment Dimensions
Explain
You can describe the difference between O(n log n) and O(n^2) sorting and explain why the difference matters at scale. You can state the Master Theorem and apply it to analyze a recursive algorithm. You can explain why hash tables have O(1) average-case lookup and what causes worst-case O(n) behavior. You can compare Dijkstra's and Bellman-Ford and explain when each is appropriate. You can articulate the difference between greedy, divide and conquer, dynamic programming, and backtracking, and identify which paradigm fits a given problem.
Build
You can implement a hash table, binary search tree, min-heap, and graph from scratch. You can implement merge sort, quicksort, BFS, DFS, and Dijkstra's algorithm. You can solve the knapsack problem with dynamic programming and the N-Queens problem with backtracking. You can profile two algorithms and demonstrate empirically that one is faster.
Debug
You can identify why a supposedly O(1) hash table operation is taking O(n) time (hash collisions, poor hash function, load factor too high). You can explain why a BST-based index is slow on sorted inserts (degeneration to a linked list). You can detect an infinite loop in a graph algorithm caused by missing cycle detection. You can find a bug in a DP solution where the base case is wrong or the recurrence does not capture all cases.
Key Takeaways
- Algorithm analysis (Big-O, Omega, Theta) predicts performance before you write code — the Master Theorem handles recursive algorithms, amortized analysis handles occasional expensive operations
- Arrays give O(1) access, linked lists give O(1) insertion at known positions, hash tables give O(1) average-case everything — every choice is a tradeoff
- The birthday problem explains why hash collisions appear sooner than expected and why load factor management matters
- Self-balancing trees (AVL, red-black) guarantee O(log n) operations that plain BSTs cannot; B+ trees extend this guarantee to disk-based storage
- Graphs model any system of connected components; BFS finds shortest paths in unweighted graphs, Dijkstra's handles weighted graphs, Bellman-Ford handles negative weights
- No comparison sort can beat O(n log n) in the worst case — this is a proven mathematical lower bound, not a limitation of current algorithms
- Greedy algorithms are fast but only correct when the greedy choice property holds; dynamic programming is always correct but requires identifying subproblem structure
- Backtracking systematically searches exponential spaces; pruning makes it practical for constraint satisfaction problems
- Every data structure and algorithm in this domain appears in systems you will build in later domains — database indexes, network routing, container scheduling, and CI/CD pipelines
Resources & Further Reading
- Introduction to Algorithms (CLRS) — The definitive algorithms textbook
- Algorithm Design Manual (Skiena) — Practical algorithm design with war stories
- Python
collectionsdocumentation —deque,defaultdict,Counter,OrderedDict - Python
heapqdocumentation — Heap operations and priority queue patterns - Python
bisectdocumentation — Binary search utilities - VisuAlgo — Interactive visualization of data structures and algorithms
- Big-O Cheat Sheet — Quick reference for algorithm complexities
- MIT OpenCourseWare 6.006 — Introduction to Algorithms lecture videos
- Sorting Algorithm Animations — Visual comparison of sorting algorithms
- Graph Algorithm Visualizer — Step-through visualizations for graph algorithms