Zettelkasten/Paper Summarization

Domain adaptation for time series forecasting via attention sharing

Computer-Nerd 2023. 2. 21.

Authors	Xiaoyong Jin, Youngsuk Park, Danielle Maddix, Hao Wang, Yuyang Wang
Title	Domain adaptation for time series forecasting via attention sharing
Publication	Proceedings of the 39th International Conference on Machine Learning (ICML 2022)
Volume	162
Issue	x
Pages	10280-10297
Year	2022
DOI	x

Introduction

Time series forecasting has recently benefited from the development of DNN (Deep Neural Network), similar to other fields with predictive tasks.
DL (Deep Learning)-based forecasting models excel at capturing complex temporal dynamics from a large dataset, but it is challenging to collect enough data.
A common solution to the data scarcity problem is to introduce another dataset with abundant data samples from a related source domain, referred to as the target domain.

Existing DA (Domain Adaptation) methods attempt to mitigate the harmful effect of domain shift by aligning features extracted across source and target domains.
Current approaches mainly focus on classification tasks, where a classifier learns a mapping from a learned domain-invariant latent space to a fixed label space using source data.
Directly applying existing DA methods to time series forecasting faces two main challenges.

The proposed solution is the DAF (Domain Adaptation Forecaster), a novel method that effectively solves the data scarcity issue in time series forecasting by applying domain adaptation techniques via attention sharing.
DAF uses an attention-based model equipped with domain adaptation to resolve the two challenges.
For evolving patterns, attention models make dynamic forecasts based on a combination of values weighted by time-dependent query-key alignments.
The alignments in an attention module are independent of specific patterns, so queries and keys can be induced to be domain-invariant while values can stay domain-specific.
DAF combines domain-invariant and domain-specific features to make multi-horizon forecasts for source and target domains through a shared attention module.
DAF provides the first end-to-end DA solution specific for multi-horizon forecasting tasks with adversarial training.

DAF outperforms state-of-the-art single-domain forecasting and domain adaptation baselines in terms of accuracy in a data-scarce target domain through extensive synthetic and real-world experiments that solve cold-start and few-shot forecasting problems.
Extensive ablation studies show the importance of the domain-invariant features induced by a discriminator and the retrained domain-specific features in the DAF model.
The designed sharing strategies with the discriminator result in better performance than other potential variants.

The proposed solution, the Domain Adaptation Forecaster (DAF), employs a sequence generator to process time series from each domain.
Each sequence generator consists of an encoder, an attention module, and a decoder.
Private encoders transform the raw input into the pattern embedding and value embedding.
M independent temporal convolutions with various kernel sizes extract short-term patterns at different scales.
Multi-scale pattern embedding is built by concatenating each pj.
The extracted pattern and value are fed into the shared attention module.
The attention module is shared by both domains and generates domain-invariant queries Q and keys K from pattern embeddings P.
Attention score alpha is computed as the normalized alignment between the query qt and keys kt′ at neighborhood positions t′ ∈ N(t) using a positive semi-definite kernel.
A representation ot is produced as the average of values vμ(t′) weighted by attention score alpha on the neighborhood N(t), followed by a MLP.
The choice of N(t) and μ(t) depends on whether G is in interpolation mode for reconstruction when t ≤ T or extrapolation mode for forecasting when t > T.

In both the cold-start and few-shot forecasting scenarios, the performance of DAF is better than or on par with the baselines in all experiments.
Cross-domain forecasters that are jointly trained end-to-end using both source and target data are overall more accurate than the single-domain forecasters. This finding indicates that source data is helpful in forecasting the target data.
Among the cross-domain forecasters, DATSING is outperformed by RDA and DAF, indicating the importance of joint training on both domains.
On a majority of the experiments, the attention-based DAF model is more accurate than or competitive to the RNN-based DA (RDA) method.
DAF improves more significantly as the number of training samples becomes smaller.
The accuracy improvement by DAF over the baselines is more significant in real-world experiments than in synthetic experiments.

Volatility Based Kernels and Moving Average Means for Accurate Forecasting with Gaussian Processes (0)	2023.02.23
Dstagnn: Dynamic spatial-temporal aware graph neural network for traffic flow forecasting (0)	2023.02.22
Short-term residential load forecasting: Impact of calendar effects and forecast granularity (0)	2023.02.20
Short-term commercial load forecasting based on peak-valley features with the TSA-ELM model (0)	2023.02.19
A comparative analysis of artificial neural network architectures for building energy consumption forecasting (0)	2023.02.18