Understanding Micrograd

Build a neural network from scratch — Karpathy's autograd engine, neurons, layers, and training — in pure Python with zero dependencies.

download courseware

Prefer to inspect a complete implementation? Download pre-completed courseware for this course.

Pre-completed project
A complete reference implementation of micrograd in pure Python — the autograd engine, neural network library, and moon dataset trainer in a single file.
sign in to download

What micrograd is, what it teaches, and why building from scalars makes everything transparent.

[ ] What Is Micrograd? [text] free
- Identify the two components of micrograd: the autograd engine and the neural network library
- Explain how micrograd relates to production frameworks like PyTorch
- Describe why scalar operations make automatic differentiation transparent
[ ] Derivatives and Intuition [text] free
- Compute the derivative of a simple function numerically using finite differences
- Explain what a derivative measures in the context of optimization
- Apply the chain rule to a two-step function composition

Wrapping scalars in objects that track their history, building a computation graph through operator overloading.

[ ] Wrapping Scalars [text]
- Explain what the Value class stores and why each field exists
- Describe how operator overloading builds a computation graph
- Trace forward computation through a simple Value expression
[ ] Backward Propagation [text]
- Implement the backward() method using topological sort
- Explain why gradient accumulation uses +=
- Trace gradient flow through a multi-step computation graph
[ ] More Operations [text]
- Implement tanh as a Value operation with the correct backward closure
- Explain why activation functions like tanh and relu need their own backward rules
- Verify gradient correctness using numerical differentiation

From a single neuron to a multi-layer network — the building blocks that learn.

[ ] The Neuron [text]
- Describe the mathematical model of a single neuron: weighted sum + bias + activation
- Implement a Neuron class using Value objects
- Explain what the parameters of a neuron represent
[ ] Layers and MLPs [text]
- Build a Layer as a collection of Neurons
- Compose Layers into an MLP (Multi-Layer Perceptron)
- Trace a forward pass through a 2-layer network
[ ] The Module Pattern [text]
- Explain the Module base class and the parameters() pattern
- Describe how zero_grad() resets the computation graph for a new training step
- Connect the Module pattern to PyTorch's nn.Module

The loss function, gradient descent, and the training loop that teaches the network to classify.

[ ] The Loss Function [text]
- Explain what a loss function measures and why training needs one
- Implement an SVM max-margin loss (hinge loss) for binary classification
- Describe the role of L2 regularization in preventing overfitting
[ ] Gradient Descent [text]
- Implement a gradient descent update step
- Explain how learning rate controls the step size
- Describe the zero_grad → forward → loss → backward → update cycle
[ ] The Training Loop [text]
- Assemble the complete training loop from components built in previous lessons
- Trace how loss decreases over training steps
- Explain what each line in the training loop does and why it is in that order

Train on a real dataset, see what the network learned, and connect micrograd to production frameworks.

[ ] Training on Moons [text]
- Generate and understand the moon dataset for binary classification
- Train an MLP to classify moon data with high accuracy
- Interpret what the trained network has learned
[ ] From Micrograd to PyTorch [text]
- Map micrograd concepts to their PyTorch equivalents
- Explain what micrograd leaves out and why production frameworks add those features
- Describe the path from micrograd to microGPT to full-scale models