摘要:
While deep learning has achieved great success in practice, its theory is still mysterious to us. Experience shows that when adopting the regularization layer (e.g., Batch Normalization, Layer Normalization), the performance of neural networks in practice will be much better. From a theoretical point of view, the regularization layers make the neural networks scale invariant, which is a helpful property for the understanding of the optimization, especially the learning rate. With the convenience of the scale invariant and effective learning rate, we can partly explain some phenomena in deep learning.
In this talk, we will briefly introduce the scale invariant architectures, as well as the effective learning rate it brings out. We will further explain how it helps explain several phenomena in deep learning. Some works based on the effective learning rate itself will also be discussed.