Running multiple attention heads in parallel to capture different types of relationships between tokens.
This lesson requires an active subscription.