SGDW

Introduced by Loshchilov et al. in Decoupled Weight Decay Regularization

SGDW is a stochastic optimization technique that decouples weight decay from the gradient update:

$$ g_{t} = \nabla{f_{t}}\left(\theta_{t-1}\right) + \lambda\theta_{t-1}$$

$$ m_{t} = \beta_{1}m_{t-1} + \eta_{t}\alpha{g}_{t}$$

$$ \theta_{t} = \theta_{t-1} - m_{t} - \eta_{t}\lambda\theta_{t-1}$$

Source: Decoupled Weight Decay Regularization

Read Paper See Code

Paper	Code	Results	Date	Stars

Task	Papers	Share
Image Classification	1	100.00%

This feature is experimental; we are continuously improving our matching algorithm.

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign