Completing the chain — backprop through the first linear layer, the embedding table, and training with manual gradients.
This lesson requires an active subscription.