Zettelkasten/Terminology Information

SELU (Scaled Exponential Linear Unit)

Computer-Nerd 2023. 2. 24.

Information

  • SELU (Scaled Exponential Linear Unit) is an activation function for neural networks that was introduced in 2017 by Klambauer et al.
  • SELU is a self-normalizing activation function, which means that it preserves the mean and variance of the activations across the layers, and thus reduces the vanishing/exploding gradients problem.
  • SELU is defined as a piecewise function that is similar to the ReLU function, but with a different slope and shift for negative inputs:
    • SELU(x) = scale * (alpha * (exp(x) - 1) for x < 0, x for x >= 0)
    • where alpha and scale are hyperparameters that are set to 1.67326 and 1.0507, respectively, to ensure the self-normalization property.
  • SELU is designed to work with deep neural networks, where the number of layers is much larger than the number of samples.
  • SELU assumes that the inputs to each layer are normalized and have zero mean and unit variance, and therefore, it performs best with data that satisfies this assumption, such as images and speech signals.
  • SELU can be combined with other regularization techniques, such as dropout and weight decay, to further improve the generalization of the network.
  • SELU has achieved state-of-the-art performance on many benchmark datasets, such as CIFAR-10, CIFAR-100, and MNIST, and has been shown to be more robust to initialization and regularization than other activation functions, such as ReLU and Leaky ReLU.

댓글