SGDW is a stochastic optimization technique that decouples weight decay from the gradient update:
$$ g_{t} = \nabla{f_{t}}\left(\theta_{t-1}\right) + \lambda\theta_{t-1}$$
$$ m_{t} = \beta_{1}m_{t-1} + \eta_{t}\alpha{g}_{t}$$
$$ \theta_{t} = \theta_{t-1} - m_{t} - \eta_{t}\lambda\theta_{t-1}$$
Source: Decoupled Weight Decay RegularizationPaper | Code | Results | Date | Stars |
---|
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |