PyTorch常用参数初始化方法详解

博客分享

 0  210

张三 2022-03-08 20:56:19

悬赏：0 积分收藏

PyTorch常用参数初始化方法详解

1、均匀分布初始化

　　 torch.nn.init.uniform_(tensor, a=0, b=1)

　　从均匀分布U(a, b)中采样，初始化张量。
　　参数：

- tensor - 需要填充的张量　　
- a - 均匀分布的下界　　
- b - 均匀分布的上界

　　例子：

w = torch.empty(3, 5)nn.init.uniform_(w)"""tensor([[0.2116, 0.3085, 0.5448, 0.6113, 0.7697],        [0.8300, 0.2938, 0.4597, 0.4698, 0.0624],        [0.5034, 0.1166, 0.3133, 0.3615, 0.3757]])"""

　　均匀分布详解：

　　若 $x$ 服从均匀分布，即 $x~U(a,b)$，其概率密度函数（表征随机变量每个取值有多大的可能性）为，

　　　　$f(x)=\left\{\begin{array}{l}\frac{1}{b-a}, \quad a<x<b \\ 0, \quad else \end{array}\right.$

　　则有期望和方差，

　　　　$\begin{array}{c}E(x)=\int_{-\infty}^{\infty} x f(x) d x=\frac{1}{2}(a+b) \\D(x)=E\left(x^{2}\right)-[E(x)]^{2}=\frac{(b-a)^{2}}{12}\end{array}$

2、正态(高斯)分布初始化

　　　　 torch.nn.init.normal_(tensor, mean=0.0, std=1.0)

　　从给定的均值和标准差的正态分布 $N\left(\right. mean, \left.s t d^{2}\right)$ 中生成值，初始化张量。

　　参数:

- tensor - 需要填充的张量　　
- mean - 正态分布的均值　　
- std - 正态分布的标准偏差

　　例子：

w = torch.Tensor(3, 5)torch.nn.init.normal_(w, mean=0, std=1)"""tensor([[-1.3903,  0.4045,  0.3048,  0.7537, -0.5189],        [-0.7672,  0.1891, -0.2226,  0.2913,  0.1295],        [ 1.4719, -0.3049,  0.3144, -1.0047, -0.5424]])"""

　　正态分布详解:

　　若随机变量 $x$ 服从正态分布，即 $x \sim N\left(\mu, \sigma^{2}\right) $, 其概率密度函数为，

　　　　$f(x)=\frac{1}{\sigma \sqrt{2 \pi}} \exp \left(-\frac{\left(x-\mu^{2}\right)}{2 \sigma^{2}}\right)$

　　正态分布概率密度函数中一些特殊的概率值:　　

- 68.268949% 的面积在平均值左右的一个标准差 $\sigma$ 范围内 ($\mu \pm \sigma$)　　
- 95.449974% 的面积在平均值左右两个标准差 $2 \sigma$ 的范围内 ($\mu \pm 2 \sigma$)　　
- 99.730020% 的面积在平均值左右三个标准差 $3 \sigma$ 的范围内 ($\mu \pm 3 \sigma$)　　
- 99.993666% 的面积在平均值左右四个标准差 $4 \sigma$ 的范围内 ($\mu \pm 4 \sigma$)

　　$\mu=0$, $\sigma=1$ 时的正态分布是标准正态分布。

3. Xavier初始化

3.1 Xavier均匀分布初始化

　　　　 torch.nn.init.xavier_uniform_(tensor, gain=1.0)

　　又称 Glorot 初始化，按照 Glorot, X. & Bengio, Y.(2010)在论文Understanding the difficulty of training deep feedforward neural networks 中描述的方法，从均匀分布 $U(?a, a)$ 中采样，初始化输入张量 $tensor$，其中 $a $ 值由下式确定：

　　　　$a=\text { gain } \times \sqrt{\frac{6}{\text { fan_in }+\text { fan_out }}}$

　　例子：

w = torch.Tensor(3, 5)nn.init.xavier_uniform_(w, gain=torch.nn.init.calculate_gain('relu'))"""tensor([[ 0.7695, -0.7687, -0.2561, -0.5307,  0.5195],        [-0.6187,  0.4913,  0.3037, -0.6374,  0.9725],        [-0.2658, -0.4051, -1.1006, -1.1264, -0.1310]])"""

3.2 Xavier正态分布初始化

　　　　 torch.nn.init.xavier_normal_(tensor, gain=1.0)

　　又称 Glorot 初始化，按照 Glorot, X. & Bengio, Y.(2010)在论文Understanding the difficulty of training deep feedforward neural networks 中描述的方法，从均匀分布 $N\left(0, s t d^{2}\right)$ 中采样，初始化输入张量 $tensor$，其中 $std$ 值由下式确定：

　　　　$\operatorname{std}=\text { gain } \times \sqrt{\frac{2}{\text { fan_in }+\text { fan_out }}}$

　　参数:

- tensor - 需要初始化的张量　　
- gain - 可选的放缩因子

　　例子：

w = torch.arange(10).view(2,-1).type(torch.float32)torch.nn.init.xavier_normal_(w)"""tensor([[-0.3139, -0.3557,  0.1285, -0.9556,  0.3255],        [-0.6212,  0.3405, -0.4150, -1.3227, -0.0069]])"""

4. kaiming初始化

4.1 kaiming均匀分布初始化

　　　　 torch.nn.init.kaiming_uniform_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu')

　　又称 He 初始化，按照He, K. et al. (2015)在论文Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification中描述的方法，从均匀分布$U(?bound, bound)$ 中采样，初始化输入张量 tensor，其中 bound 值由下式确定：

　　　　$\text { bound }=\text { gain } \times \sqrt{\frac{3}{\text { fan_mode }}}$

　　参数:

- tensor - 需要初始化的张量；
- $\mathrm{a}$- 这层之后使用的 rectifier的斜率系数，用来计算gain =\sqrt{\frac{2}{1+\mathrm{a}^{2}}} (此参数仅在参数nonlinea rity为'leaky_relu'时生效)；
- mode - 可以为“fan_in”（默认）或“fan_out”。“fan_in”维持前向传播时权值方差，“fan_out”维持反向传播时的方差；
- nonlinearity - 非线性函数（nn.functional中的函数名），pytorch建议仅与“relu”或“leaky_relu”(默认)一起使用；

　　例子：

w = torch.Tensor(3, 5)torch.nn.init.kaiming_uniform_(w, mode='fan_in', nonlinearity='relu')"""tensor([[-0.4362, -0.8177, -0.7034,  0.7306, -0.6457],        [-0.5749, -0.6480, -0.8016, -0.1434,  0.0785],        [ 1.0369, -0.0676,  0.7430, -0.2484, -0.0895]])"""

4.2 kaiming正态分布初始化

　　　　 torch.nn.init.kaiming_normal_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu')

　　又称He初始化，按照He, K. et al. (2015)在论文Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification中描述的方法，从正态分布 $N\left(0, s t d^{2}\right)$ 中采样，初始化输入张量tensor，其中std值由下式确定：

　　参数:

- tensor - 需要初始化的张量；
- $\mathrm{a} $ - 这层之后使用的 rectifier 的斜率系数，用来计算 $gain =\sqrt{\frac{2}{1+\mathrm{a}^{2}}} $ (此参数仅在参数nonlinea rity为'leaky_relu'时生效)；
- mode - 可以为"fan_in" (默认) 或“fan_out"。"fan_in"维持前向传播时权值方差，"fan_out"维持反向传播时的方差；
- nonlinearity - 非线性函数 (nn.functional中的函数名)，pytorch建议仅与“relu”或"leaky_relu”(默认)一起使用；

5、正交矩阵初始化

　　　　 torch.nn.init.orthogonal_(tensor, gain=1)

　　用一个(半)正交矩阵初始化输入张量，参考Saxe, A. et al. (2013) - Exact solutions to the nonlinear dynamics of learning in deep linear neural networks。输入张量必须至少有 2 维，对于大于 2 维的张量，超出的维度将被flatten化。

　　正交初始化可以使得卷积核更加紧凑，可以去除相关性，使模型更容易学到有效的参数。

　　参数:

- tensor - 需要初始化的张量　　
- gain - 可选的放缩因子

　　例子：

w = torch.Tensor(3, 5)torch.nn.init.orthogonal_(w)"""tensor([[ 0.7395, -0.1503,  0.4474,  0.4321, -0.2090],        [-0.2625,  0.0112,  0.6515, -0.4770, -0.5282],        [ 0.4554,  0.6548,  0.0970, -0.4851,  0.3453]])"""

6、稀疏矩阵初始化

　　　　 torch.nn.init.sparse_(tensor, sparsity, std=0.01)

　　将2维的输入张量作为稀疏矩阵填充，其中非零元素由正态分布 $N\left(0,0.01^{2}\right)$ 生成。参考Martens, J.(2010)的 Deep learning via Hessian-free optimization。

　　参数:

- tensor - 需要填充的张量　　
- sparsity - 每列中需要被设置成零的元素比例　　
- std - 用于生成非零元素的正态分布的标准偏差

　　例子：

w = torch.Tensor(3, 5)torch.nn.init.sparse_(w, sparsity=0.1)"""tensor([[-0.0026,  0.0000,  0.0100,  0.0046,  0.0048],        [ 0.0106, -0.0046,  0.0000,  0.0000,  0.0000],        [ 0.0000, -0.0005,  0.0150, -0.0097, -0.0100]])"""

7、常数初始化

　　　　 torch.nn.init.constant_(tensor, val)

　　使值为常数 val 。

　　例子：

w=torch.Tensor(3,5)nn.init.constant_(w,1.2)"""tensor([[1.2000, 1.2000, 1.2000, 1.2000, 1.2000],        [1.2000, 1.2000, 1.2000, 1.2000, 1.2000],        [1.2000, 1.2000, 1.2000, 1.2000, 1.2000]])"""

8、单位矩阵初始化

　　　　 torch.nn.init.eye_(tensor)

　　将二维 tensor 初始化为单位矩阵（the identity matrix）

　　例子：

w=torch.Tensor(3,5)nn.init.eye_(w)"""tensor([[1., 0., 0., 0., 0.],        [0., 1., 0., 0., 0.],        [0., 0., 1., 0., 0.]])"""

9、零填充初始化

　　　　 torch.nn.init.zeros_(tensor)

　　例子：

w = torch.empty(3, 5)nn.init.zeros_(w)"""tensor([[0., 0., 0., 0., 0.],        [0., 0., 0., 0., 0.],        [0., 0., 0., 0., 0.]])"""

10、应用

　　例子：

print('module-----------')print(model)print('setup-----------')for m in model.modules():    if isinstance(m,nn.Linear):        nn.init.xavier_uniform_(m.weight, gain=nn.init.calculate_gain('relu'))"""module-----------Sequential(  (flatten): FlattenLayer()  (linear1): Linear(in_features=784, out_features=512, bias=True)  (activation): ReLU()  (linear2): Linear(in_features=512, out_features=256, bias=True)  (linear3): Linear(in_features=256, out_features=10, bias=True))setup-----------"""

　　例子： 　

for param in model.parameters():    nn.init.uniform_(param)

　　例子：

def weights_init(m):    classname = m.__class__.__name__    if classname.find('Conv2d') != -1:        nn.init.xavier_normal_(m.weight.data)        nn.init.constant_(m.bias.data, 0.0)    elif classname.find('Linear') != -1:        nn.init.xavier_normal_(m.weight)        nn.init.constant_(m.bias, 0.0)model.apply(weights_init) #apply函数会递归地搜索网络内的所有module并把参数表示的函数应用到所有的module上。

posted @ 2022-03-08 20:09 Learner- 阅读(6) 评论(0) 编辑收藏举报

上一篇：while+else使用，while死循环与while的嵌套，for循环基本使用，range关键字，for的循环补充(break、continue、else)，for循环的嵌套，基本数据类型及内置方法

下一篇：DPLL算法（求解k-SAT问题）详解（C++实现）

回帖

张三（王者段位）

821 积分 (2)粉丝 (41)源码

温馨提示

您可以通过每日签到获得积分；
您也可以通过发布源码或者分享技术获得积分；

亦奇源码

PyTorch常用参数初始化方法详解

PyTorch常用参数初始化方法详解

1、均匀分布初始化

2、正态(高斯)分布初始化

3. Xavier初始化

3.1 Xavier均匀分布初始化

3.2 Xavier正态分布初始化

4. kaiming初始化

4.1 kaiming均匀分布初始化

4.2 kaiming正态分布初始化

5、正交矩阵初始化

6、稀疏矩阵初始化

7、常数初始化

8、单位矩阵初始化

9、零填充初始化

10、应用

张三 （王者 段位）

温馨提示

最新会员

张三（王者段位）