Normalizing activations at each layer — the technique that made training deep networks practical, with learnable scale and shift parameters.
This lesson requires an active subscription.