Zettelkasten/Paper Summarization

FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting

Computer-Nerd 2023. 2. 17.
Authors Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, Rong Jin
Title FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting
Publication Proceedings of the 39th International Conference on Machine Learning (ICML 2022)
Volume 162
Issue x
Pages 27268-27286
Year 2022
DOI x

Introduction

Background

Previous Research

  • Despite the impressive results achieved by RNN (Recurrent Neural Network)-type methods, they often suffer from the problem of gradient vanishing or exploding, significantly limiting their performance.
  • Transformer has been introduced to capture long-term dependencies in time series forecasting and shows promising results.
  • Numerous studies are devoted to reducing the computational cost of Transformer, as high computational complexity and memory requirement make it difficult for Transformer to be applied to long sequence modeling.

Proposed Model

Significance

  • The proposed model improves the performance of state-of-the-art methods by 14.8% and 22.6% for multivariate forecasting and univariate forecasting, respectively.
  • Extensive experiments were conducted over 6 benchmark datasets across multiple domains (energy, traffic, economics, weather and disease).
  • The effectiveness of the Fourier component selection method is verified both theoretically and empirically.

Proposed Model

FEDformer Structure

  • Long-term time-series forecasting is a sequence to sequence problem
  • The input length is denoted as I and output length as O, while D represents the hidden states of the series
  • The encoder input is an I × D matrix and the decoder has (I/2 + O) × D input.
  • FEDformer Structure consists of deep decomposition architecture, including FEB (Frequency Enhanced Block), FEA (Frequency Enhanced Attention), and the MOEDecomp (Mixture Of Experts Decomposition) block, multilayer structure encoder with seasonal components, and multilayer structure decoder with seasonal and trend components.

FEB-f structure

FEA-f structure

  • FEA module that has two different versions (FEA-f & FEA-w) which are implemented through DFT and DWT projection respectively with attention design and can replace the cross-attention block.
  • The final prediction is the sum of the two refined decomposed components as WS · X M de + T M de, where WS is to project the deep transformed seasonal component X M de to the target dimension.

Wavelet blocks

  • The Fourier Enhanced Structures use DFT.
  • DFT is defined as Xl = PN−1 n=0 xne−iωln, where i is the imaginary unit and Xl, l = 1, 2...L is a sequence of complex numbers in the frequency domain.
  • FEA module has two different versions (FEA-f & FEA-w) which are implemented through DFT and DWT projection respectively with attention design and can replace the cross-attention block.

Experiment

Multivariate long-term series forecasting results
Univariate long-term series forecasting results
Ablation studies

댓글