Authors | Xiaoyong Jin, Youngsuk Park, Danielle Maddix, Hao Wang, Yuyang Wang |
Title | Domain adaptation for time series forecasting via attention sharing |
Publication | Proceedings of the 39th International Conference on Machine Learning (ICML 2022) |
Volume | 162 |
Issue | x |
Pages | 10280-10297 |
Year | 2022 |
DOI | x |
Introduction
Background
- Time series forecasting has recently benefited from the development of DNN (Deep Neural Network), similar to other fields with predictive tasks.
- DL (Deep Learning)-based forecasting models excel at capturing complex temporal dynamics from a large dataset, but it is challenging to collect enough data.
- A common solution to the data scarcity problem is to introduce another dataset with abundant data samples from a related source domain, referred to as the target domain.
Previous Research
- Existing DA (Domain Adaptation) methods attempt to mitigate the harmful effect of domain shift by aligning features extracted across source and target domains.
- Current approaches mainly focus on classification tasks, where a classifier learns a mapping from a learned domain-invariant latent space to a fixed label space using source data.
- Directly applying existing DA methods to time series forecasting faces two main challenges.
Proposed Model
- The proposed solution is the DAF (Domain Adaptation Forecaster), a novel method that effectively solves the data scarcity issue in time series forecasting by applying domain adaptation techniques via attention sharing.
- DAF uses an attention-based model equipped with domain adaptation to resolve the two challenges.
- For evolving patterns, attention models make dynamic forecasts based on a combination of values weighted by time-dependent query-key alignments.
- The alignments in an attention module are independent of specific patterns, so queries and keys can be induced to be domain-invariant while values can stay domain-specific.
- DAF combines domain-invariant and domain-specific features to make multi-horizon forecasts for source and target domains through a shared attention module.
- DAF provides the first end-to-end DA solution specific for multi-horizon forecasting tasks with adversarial training.
Significance
- DAF outperforms state-of-the-art single-domain forecasting and domain adaptation baselines in terms of accuracy in a data-scarce target domain through extensive synthetic and real-world experiments that solve cold-start and few-shot forecasting problems.
- Extensive ablation studies show the importance of the domain-invariant features induced by a discriminator and the retrained domain-specific features in the DAF model.
- The designed sharing strategies with the discriminator result in better performance than other potential variants.
Proposed Model
- The proposed solution, the Domain Adaptation Forecaster (DAF), employs a sequence generator to process time series from each domain.
- Each sequence generator consists of an encoder, an attention module, and a decoder.
- Private encoders transform the raw input into the pattern embedding and value embedding.
- M independent temporal convolutions with various kernel sizes extract short-term patterns at different scales.
- Multi-scale pattern embedding is built by concatenating each pj.
- The extracted pattern and value are fed into the shared attention module.
- The attention module is shared by both domains and generates domain-invariant queries Q and keys K from pattern embeddings P.
- Attention score alpha is computed as the normalized alignment between the query qt and keys kt′ at neighborhood positions t′ ∈ N(t) using a positive semi-definite kernel.
- A representation ot is produced as the average of values vμ(t′) weighted by attention score alpha on the neighborhood N(t), followed by a MLP.
- The choice of N(t) and μ(t) depends on whether G is in interpolation mode for reconstruction when t ≤ T or extrapolation mode for forecasting when t > T.
Experiment
- In both the cold-start and few-shot forecasting scenarios, the performance of DAF is better than or on par with the baselines in all experiments.
- Cross-domain forecasters that are jointly trained end-to-end using both source and target data are overall more accurate than the single-domain forecasters. This finding indicates that source data is helpful in forecasting the target data.
- Among the cross-domain forecasters, DATSING is outperformed by RDA and DAF, indicating the importance of joint training on both domains.
- On a majority of the experiments, the attention-based DAF model is more accurate than or competitive to the RNN-based DA (RDA) method.
- DAF improves more significantly as the number of training samples becomes smaller.
- The accuracy improvement by DAF over the baselines is more significant in real-world experiments than in synthetic experiments.
댓글