Understanding Micrograd
Build a neural network from scratch — Karpathy's autograd engine, neurons, layers, and training — in pure Python with zero dependencies.
download courseware
Prefer to inspect a complete implementation? Download pre-completed courseware for this course.
- Pre-completed projectsign in to downloadA complete reference implementation of micrograd in pure Python — the autograd engine, neural network library, and moon dataset trainer in a single file.
The Big Picture
What micrograd is, what it teaches, and why building from scalars makes everything transparent.
-
- Identify the two components of micrograd: the autograd engine and the neural network library
- Explain how micrograd relates to production frameworks like PyTorch
- Describe why scalar operations make automatic differentiation transparent
-
- Compute the derivative of a simple function numerically using finite differences
- Explain what a derivative measures in the context of optimization
- Apply the chain rule to a two-step function composition
The Value Class
Wrapping scalars in objects that track their history, building a computation graph through operator overloading.
- [ ] Wrapping Scalars
- Explain what the Value class stores and why each field exists
- Describe how operator overloading builds a computation graph
- Trace forward computation through a simple Value expression
-
- Implement the backward() method using topological sort
- Explain why gradient accumulation uses +=
- Trace gradient flow through a multi-step computation graph
- [ ] More Operations
- Implement tanh as a Value operation with the correct backward closure
- Explain why activation functions like tanh and relu need their own backward rules
- Verify gradient correctness using numerical differentiation
Building a Neural Network
From a single neuron to a multi-layer network — the building blocks that learn.
- [ ] The Neuron
- Describe the mathematical model of a single neuron: weighted sum + bias + activation
- Implement a Neuron class using Value objects
- Explain what the parameters of a neuron represent
- [ ] Layers and MLPs
- Build a Layer as a collection of Neurons
- Compose Layers into an MLP (Multi-Layer Perceptron)
- Trace a forward pass through a 2-layer network
-
- Explain the Module base class and the parameters() pattern
- Describe how zero_grad() resets the computation graph for a new training step
- Connect the Module pattern to PyTorch's nn.Module
Training
The loss function, gradient descent, and the training loop that teaches the network to classify.
-
- Explain what a loss function measures and why training needs one
- Implement an SVM max-margin loss (hinge loss) for binary classification
- Describe the role of L2 regularization in preventing overfitting
- [ ] Gradient Descent
- Implement a gradient descent update step
- Explain how learning rate controls the step size
- Describe the zero_grad → forward → loss → backward → update cycle
-
- Assemble the complete training loop from components built in previous lessons
- Trace how loss decreases over training steps
- Explain what each line in the training loop does and why it is in that order
The Full Picture
Train on a real dataset, see what the network learned, and connect micrograd to production frameworks.
-
- Generate and understand the moon dataset for binary classification
- Train an MLP to classify moon data with high accuracy
- Interpret what the trained network has learned
-
- Map micrograd concepts to their PyTorch equivalents
- Explain what micrograd leaves out and why production frameworks add those features
- Describe the path from micrograd to microGPT to full-scale models