Path-sgd: Path-normalized Optimization in Deep Neural Networks
1. Summary
The authors introduce a new regularizer, -path regularizer and its corresponding SGD method, path-SGD. They show that both -path regularizer and path-SGD are rescaling-free, and this rescaling-free property matches with the same property of a feed-forward neural network. They also show that traditional SGD does not share this property and therefore cannot optimize with the information of the geometry of the parameter space of deep networks.