Zettelkasten/Paper Summarization

Domain adaptation for time series forecasting via attention sharing

Computer-Nerd 2023. 2. 21.
Authors Xiaoyong Jin, Youngsuk Park, Danielle Maddix, Hao Wang, Yuyang Wang
Title Domain adaptation for time series forecasting via attention sharing
Publication Proceedings of the 39th International Conference on Machine Learning (ICML 2022)
Volume 162
Issue x
Pages 10280-10297
Year 2022
DOI x

Introduction

Background

  • Time series forecasting has recently benefited from the development of DNN (Deep Neural Network), similar to other fields with predictive tasks.
  • DL (Deep Learning)-based forecasting models excel at capturing complex temporal dynamics from a large dataset, but it is challenging to collect enough data.
  • A common solution to the data scarcity problem is to introduce another dataset with abundant data samples from a related source domain, referred to as the target domain.

Previous Research

  • Existing DA (Domain Adaptation) methods attempt to mitigate the harmful effect of domain shift by aligning features extracted across source and target domains.
  • Current approaches mainly focus on classification tasks, where a classifier learns a mapping from a learned domain-invariant latent space to a fixed label space using source data.
  • Directly applying existing DA methods to time series forecasting faces two main challenges.

Proposed Model

  • The proposed solution is the DAF (Domain Adaptation Forecaster), a novel method that effectively solves the data scarcity issue in time series forecasting by applying domain adaptation techniques via attention sharing.
  • DAF uses an attention-based model equipped with domain adaptation to resolve the two challenges.
  • For evolving patterns, attention models make dynamic forecasts based on a combination of values weighted by time-dependent query-key alignments.
  • The alignments in an attention module are independent of specific patterns, so queries and keys can be induced to be domain-invariant while values can stay domain-specific.
  • DAF combines domain-invariant and domain-specific features to make multi-horizon forecasts for source and target domains through a shared attention module.
  • DAF provides the first end-to-end DA solution specific for multi-horizon forecasting tasks with adversarial training.

Significance

  • DAF outperforms state-of-the-art single-domain forecasting and domain adaptation baselines in terms of accuracy in a data-scarce target domain through extensive synthetic and real-world experiments that solve cold-start and few-shot forecasting problems.
  • Extensive ablation studies show the importance of the domain-invariant features induced by a discriminator and the retrained domain-specific features in the DAF model.
  • The designed sharing strategies with the discriminator result in better performance than other potential variants.

Proposed Model

DAF architecture
Shared attention module process

  • The proposed solution, the Domain Adaptation Forecaster (DAF), employs a sequence generator to process time series from each domain.
  • Each sequence generator consists of an encoder, an attention module, and a decoder.
  • Private encoders transform the raw input into the pattern embedding and value embedding.
  • M independent temporal convolutions with various kernel sizes extract short-term patterns at different scales.
  • Multi-scale pattern embedding is built by concatenating each pj.
  • The extracted pattern and value are fed into the shared attention module.
  • The attention module is shared by both domains and generates domain-invariant queries Q and keys K from pattern embeddings P.
  • Attention score alpha is computed as the normalized alignment between the query qt and keys kt′ at neighborhood positions t′ ∈ N(t) using a positive semi-definite kernel.
  • A representation ot is produced as the average of values vμ(t′) weighted by attention score alpha on the neighborhood N(t), followed by a MLP.
  • The choice of N(t) and μ(t) depends on whether G is in interpolation mode for reconstruction when t ≤ T or extrapolation mode for forecasting when t > T.

Experiment

Performance comparison on synthetic datasets
Performance comparison on real-world benchmark datasets

  • In both the cold-start and few-shot forecasting scenarios, the performance of DAF is better than or on par with the baselines in all experiments.
  • Cross-domain forecasters that are jointly trained end-to-end using both source and target data are overall more accurate than the single-domain forecasters. This finding indicates that source data is helpful in forecasting the target data.
  • Among the cross-domain forecasters, DATSING is outperformed by RDA and DAF, indicating the importance of joint training on both domains.
  • On a majority of the experiments, the attention-based DAF model is more accurate than or competitive to the RNN-based DA (RDA) method.
  • DAF improves more significantly as the number of training samples becomes smaller.
  • The accuracy improvement by DAF over the baselines is more significant in real-world experiments than in synthetic experiments.

댓글