Everyone's talking about AI adoption. Almost nobody has the real numbers. Help us change that — and get the full report 👉 Engineers | Leaders

Understanding Micrograd

Build a neural network from scratch — Karpathy's autograd engine, neurons, layers, and training — in pure Python with zero dependencies.


download courseware

Prefer to inspect a complete implementation? Download pre-completed courseware for this course.

  • Pre-completed project
    A complete reference implementation of micrograd in pure Python — the autograd engine, neural network library, and moon dataset trainer in a single file.
    sign in to download

The Big Picture

What micrograd is, what it teaches, and why building from scalars makes everything transparent.

  • [ ] What Is Micrograd? [text] free
    • Identify the two components of micrograd: the autograd engine and the neural network library
    • Explain how micrograd relates to production frameworks like PyTorch
    • Describe why scalar operations make automatic differentiation transparent
    • Compute the derivative of a simple function numerically using finite differences
    • Explain what a derivative measures in the context of optimization
    • Apply the chain rule to a two-step function composition

The Value Class

Wrapping scalars in objects that track their history, building a computation graph through operator overloading.

  • [ ] Wrapping Scalars [text]
    • Explain what the Value class stores and why each field exists
    • Describe how operator overloading builds a computation graph
    • Trace forward computation through a simple Value expression
    • Implement the backward() method using topological sort
    • Explain why gradient accumulation uses +=
    • Trace gradient flow through a multi-step computation graph
  • [ ] More Operations [text]
    • Implement tanh as a Value operation with the correct backward closure
    • Explain why activation functions like tanh and relu need their own backward rules
    • Verify gradient correctness using numerical differentiation

Building a Neural Network

From a single neuron to a multi-layer network — the building blocks that learn.

  • [ ] The Neuron [text]
    • Describe the mathematical model of a single neuron: weighted sum + bias + activation
    • Implement a Neuron class using Value objects
    • Explain what the parameters of a neuron represent
  • [ ] Layers and MLPs [text]
    • Build a Layer as a collection of Neurons
    • Compose Layers into an MLP (Multi-Layer Perceptron)
    • Trace a forward pass through a 2-layer network
  • [ ] The Module Pattern [text]
    • Explain the Module base class and the parameters() pattern
    • Describe how zero_grad() resets the computation graph for a new training step
    • Connect the Module pattern to PyTorch's nn.Module

Training

The loss function, gradient descent, and the training loop that teaches the network to classify.

  • [ ] The Loss Function [text]
    • Explain what a loss function measures and why training needs one
    • Implement an SVM max-margin loss (hinge loss) for binary classification
    • Describe the role of L2 regularization in preventing overfitting
  • [ ] Gradient Descent [text]
    • Implement a gradient descent update step
    • Explain how learning rate controls the step size
    • Describe the zero_grad → forward → loss → backward → update cycle
  • [ ] The Training Loop [text]
    • Assemble the complete training loop from components built in previous lessons
    • Trace how loss decreases over training steps
    • Explain what each line in the training loop does and why it is in that order

The Full Picture

Train on a real dataset, see what the network learned, and connect micrograd to production frameworks.

  • [ ] Training on Moons [text]
    • Generate and understand the moon dataset for binary classification
    • Train an MLP to classify moon data with high accuracy
    • Interpret what the trained network has learned
    • Map micrograd concepts to their PyTorch equivalents
    • Explain what micrograd leaves out and why production frameworks add those features
    • Describe the path from micrograd to microGPT to full-scale models