Zettelkasten/Terminology Information

Gradient vanishing

Computer-Nerd 2023. 2. 24.

Information

Gradient vanishing refers to a problem that occurs during the training of deep neural networks where the gradients used to update the model's parameters become extremely small as they propagate through the layers of the network.
This happens because gradients are calculated using the chain rule of differentiation, and the chain rule involves multiplying many small gradients together, which can lead to an overall vanishing gradient.
When the gradients become very small, it can become difficult for the model to learn from the data, as the updates to the model's parameters become increasingly insignificant.
Gradient vanishing is more likely to occur in deep neural networks with many layers, especially when the layers use activation functions that saturate (i.e., have flat regions in their output range).
Some common solutions to gradient vanishing include using activation functions that do not saturate, such as ReLU or its variants, and using initialization methods that ensure that the gradients have a reasonable magnitude.
Another approach is to use alternative training methods such as residual connections or attention mechanisms that allow the gradients to bypass many layers, reducing the chance of vanishing.

LReLU (Leaky Rectified Linear Unit) (0)	2023.02.25
SELU (Scaled Exponential Linear Unit) (0)	2023.02.24
Bagging (Bootstrap Aggregating) (0)	2023.02.23
Long-term dependency (0)	2023.02.23
Boosting (0)	2023.02.22