Deep Learning Guide
0. Introduction
深度学习实践中发现模型的测试结果很差,该如何改进呢?
"细节决定成败",失败的原因也许只是忽略某个简单的小问题(细节)。
本文汇总了深度学习的知识点,可以用作日常查找DNN问题的check list。
1. Convolutional Network(CNN)
1.1 CNN的基础架构(Architecture)发展历程:
从CV(Computer Vision)Image Classification发展而来的通用CNN结构。
- 2012年,AlexNet, ImageNet Classification with Deep Convolutional Neural Networks [阅读原文] [查阅归档]
- 2014年,GoogLeNet, Going Deeper with Convolutions [阅读原文] [查阅归档]
- 2015年,VGG,Very Deep Convolutional Networks for Large-Scale Image Recognition [阅读原文] [查阅归档]
- 2015年,Inception-v3,Rethinking the Inception Architecture for Computer Vision [阅读原文] [查阅归档]
- 2015年,ResNet,Deep Residual Learning for Image Recognition [阅读原文] [查阅归档]
- 2015年,U-Net,U-Net: Convolutional Networks for Biomedical Image Segmentation [阅读原文] [查阅归档]
- 2016年,Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning [阅读原文] [查阅归档]
1.2 Special Convolution
为特殊的数据/目标而优化设计的Conv操作。
1.3 Optimized Network for Mobile device
高效的CNN设计,为嵌入式设备(手机、汽车、智能化设备等)而优化。
- 2016年,SqueezeNet,SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size [阅读原文] [查阅归档]
- 2017年,MobileNet,MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications [阅读原文] [查阅归档]
- 2018年,MobileNetV2,MobileNetV2: Inverted Residuals and Linear Bottlenecks [阅读原文] [查阅归档]
- 2019年,MobileNetV3,Searching for MobileNetV3 [阅读原文] [查阅归档]
- 2019年,EfficientNet,EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks [阅读原文] [查阅归档]
2. Normalization, Standardization and Regularization
什么是Normalization, Standardization and Regularization? [阅读原文] [查阅归档]
2.1 Normalization Techniques in Deep Neural Networks
- 2015年,Batch Normalization,Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift [阅读原文] [查阅归档]
- 2016年,Weight Normalization,Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks [阅读原文] [查阅归档]
- 2016年,Layer Normalization, Layer Normalization [阅读原文] [查阅归档]
- 2016年,Instance Normalization,Instance Normalization: The Missing Ingredient for Fast Stylization [阅读原文] [查阅归档]
- 2018年,Group Normalization,Group Normalization [阅读原文] [查阅归档]
- 2018年,Batch-Instance Normalization,Batch-Instance Normalization for Adaptively Style-Invariant Neural Networks [阅读原文] [查阅归档]
- 2018年,Switchable Normalization,DIFFERENTIABLE LEARNING-TO-NORMALIZE VIA SWITCHABLE NORMALIZATION [阅读原文] [查阅归档]
- 2018年,Do Normalization Layers in a Deep ConvNet Really Need to Be Distinct? [阅读原文] [查阅归档]
3. Weight Initialization
训练神经网络是复杂的事情,目前训练方法并没有理论上完善的指导,训练方法和技巧都是凭经验不断试错。
实验中发现,良好的Weight Initialization可以加速训练速度、提升最终的测试效果。
- 2010年,Xavier Initialization,Understanding the difficulty of training deep feedforward neural networks [阅读原文] [查阅归档]
Pytorch已实现:torch.nn.init.xavier_uniform_() 和 torch.nn.init.xavier_normal_() - 2010年,sparse,Deep learning via Hessian-free optimization [阅读原文] [查阅归档]
Pytorch已实现:torch.nn.init.sparse_() - 2013年,Orthogonal,Exact solutions to the nonlinear dynamics of learning in deep linear neural networks [阅读原文] [查阅归档]
Pytorch已实现:torch.nn.init.orthogonal_() - 2015年,Kaiming Initialization,Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification [阅读原文] [查阅归档]
Pytorch已实现:torch.nn.init.kaiming_uniform_() 和 torch.nn.init.kaiming_normal_() - 2017年,On weight initialization in deep neural networks [阅读原文] [查阅归档]
从理论上分析Xavier、Kaiming方法,并给出了更精确的初始化参数。 - 2018年,How to Start Training: The Effect of Initialization and Architecture [阅读原文] [查阅归档]
- 2019年,How to Initialize your Network?Robust Initialization for WeightNorm & ResNets [阅读原文] [查阅归档]
- 2020年,Revisiting Initialization of Neural Networks [阅读原文] [查阅归档]
从理论上分析(反对)Xavier、Kaiming方法,并给出了其它的初始化参数。结论与Xavier、Kaiming有差异。
1
from torch import nn
2
3
# define Network
4
class MyNet(nn.Module):
5
def __init(self):
6
pass
7
def forward(self):
8
pass
9
10
# define weight initialization method
11
def initialize_parameters(m):
12
if isinstance(m, nn.Conv2d):
13
nn.init.kaiming_normal_(m.weight.data, a=0.25, nonlinearity='leaky_relu')
14
nn.init.constant_(m.bias.data, 0)
15
16
model = MyNet()
17
# weight initialization
18
with torch.no_grad():
19
model.apply(initialize_parameters)
20
21
# start training
4. Optimizer
Neural Network普遍采用梯度下降(Gradient Descent)的方法来优化模型的参数(weight/parameter),从最原始的GD、SGD,经过不断的改良,逐渐发展出一系列的优化器(Optimizer),选择合适的Optimizer是训练DNN的关键一步。
那么该如何选择呢?简单的回答就是 Adm或NAdm。
4.1 如何选择Optimizer?
4.2 Gradient Descent Based Optimizer
1950年,SGD的理论基础,A Stochastic Approximation Method [阅读原文] [查阅归档]
Pytorch:torch.optim.SGD()1983年,Nesterov,A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) [阅读原文] [查阅归档]
1999年,Momentum,On the momentum term in gradient descent learning algorithms [阅读原文] [查阅归档]
2011年,Adagrad, Adaptive Subgradient Methods for Online Learning and Stochastic Optimization [阅读原文] [查阅归档]
Pytorch: torch.optim.Adagrad()2012年,Adadelt,ADADELTA: An Adaptive Learning Rate Method [阅读原文] [查阅归档]
Pytorch: torch.optim.Adadelta()2013年,RMSProp,Generating Sequences With Recurrent Neural Networks [阅读原文] [查阅归档]
Pytorch: torch.optim.RMSprop()2014年,Adam,Adam: A Method for Stochastic Optimization [阅读原文] [查阅归档]
2016年,NAdam,INCORPORATING NESTEROV MOMENTUM INTO ADAM [阅读原文] [查阅归档]
- L-BFGS algorithm
torch.optim.LBFGS() - Averaged Stochastic Gradient Descent.
1992年,ACCELERATION OF STOCHASTIC APPROXIMATION BY AVERAGING [阅读原文] [查阅归档]
torch.optim.ASGD()
5. Learning Rate Scheduler
优化器(optimizer)的学习率(learning rate)LR是深度学习的最重要的的超参数(hyper-parameter)。
训练过程中调整LR是必备的操作,不良的调整策略(Learning Rate Scheduler)会影响最终的 模型效果 及其 训练时间。
常用的Scheduler有:
- 按比例调整:torch.optim.lr_scheduler.MultiplicativeLR()
- 阶梯调整:torch.optim.lr_scheduler.StepLR()
- 阶梯调整(在指定的epoch):torch.optim.lr_scheduler.MultiStepLR()
- 按指数函数调整:torch.optim.lr_scheduler.ExponentialLR()
- 按余弦函数调整:torch.optim.lr_scheduler.CosineAnnealingLR()
- 按余弦函数调整+周期的重置:torch.optim.lr_scheduler.CosineAnnealingWarmRestarts
- 逐步下降直到满足某个指标(如Loss)torch.optim.lr_scheduler.ReduceLROnPlateau()
- 周期性变化:torch.optim.lr_scheduler.CyclicLR()
- 单周期变化torch.optim.lr_scheduler.OneCycleLR()
- Stochastic Weight Averaging:torch.optim.swa_utils.SWALR()
- 2015年,CLR,Cyclical Learning Rates for Training Neural Networks [阅读原文] [查阅归档]
Pytorch: torch.optim.lr_scheduler.CyclicLR() - 2017年,SGDR,sgdr: stochastic gradient descent with warm restarts [阅读原文] [查阅归档]
Pytorch: torch.optim.lr_scheduler.CosineAnnealingWarmRestarts() - 2017年,Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates [阅读原文] [查阅归档]
Pytorch: torch.optim.lr_scheduler.OneCycleLR() - 2018年,SWA,Averaging Weights Leads to Wider Optima and Better Generalization [阅读原文] [查阅归档]
Pytorch: torch.optim.swa_utils.SWALR() - 2019年,an exponential learning rate schedule for deep learning [阅读原文] [查阅归档]
- 2019年,combining learning rate decay and weight decay with complexity gradient descent - PART I [阅读原文] [查阅归档]
- 2019年,The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure For Least Squares [阅读原文] [查阅归档]
- 2020年,k-decay,k-decay: a new method for learning rate schedule [阅读原文] [查阅归档]
- 2020年,LQA,Automatic, Dynamic, and Nearly Optimal Learning Rate Specification by Local Quadratic Approximation [阅读原文] [查阅归档]
- 2020年,Stochastic Gradient Descent with Large Learning Rate [阅读原文] [查阅归档]