Are Transformers Effective for Time Series Forecasting?
A. Zeng, M. Chen, L. Zhang, and Q. Xu. (2022)cite arxiv:2205.13504Comment: Code is available at https://github.com/cure-lab/LTSF-Linear.
Abstract
Recently, there has been a surge of Transformer-based solutions for the
long-term time series forecasting (LTSF) task. Despite the growing performance
over the past few years, we question the validity of this line of research in
this work. Specifically, Transformers is arguably the most successful solution
to extract the semantic correlations among the elements in a long sequence.
However, in time series modeling, we are to extract the temporal relations in
an ordered set of continuous points. While employing positional encoding and
using tokens to embed sub-series in Transformers facilitate preserving some
ordering information, the nature of the permutation-invariant
self-attention mechanism inevitably results in temporal information loss. To
validate our claim, we introduce a set of embarrassingly simple one-layer
linear models named LTSF-Linear for comparison. Experimental results on nine
real-life datasets show that LTSF-Linear surprisingly outperforms existing
sophisticated Transformer-based LTSF models in all cases, and often by a large
margin. Moreover, we conduct comprehensive empirical studies to explore the
impacts of various design elements of LTSF models on their temporal relation
extraction capability. We hope this surprising finding opens up new research
directions for the LTSF task. We also advocate revisiting the validity of
Transformer-based solutions for other time series analysis tasks (e.g., anomaly
detection) in the future. Code is available at:
https://github.com/cure-lab/LTSF-Linear.
Description
Are Transformers Effective for Time Series Forecasting?
%0 Generic
%1 zeng2022transformers
%A Zeng, Ailing
%A Chen, Muxi
%A Zhang, Lei
%A Xu, Qiang
%D 2022
%K time_series transformer
%T Are Transformers Effective for Time Series Forecasting?
%U http://arxiv.org/abs/2205.13504
%X Recently, there has been a surge of Transformer-based solutions for the
long-term time series forecasting (LTSF) task. Despite the growing performance
over the past few years, we question the validity of this line of research in
this work. Specifically, Transformers is arguably the most successful solution
to extract the semantic correlations among the elements in a long sequence.
However, in time series modeling, we are to extract the temporal relations in
an ordered set of continuous points. While employing positional encoding and
using tokens to embed sub-series in Transformers facilitate preserving some
ordering information, the nature of the permutation-invariant
self-attention mechanism inevitably results in temporal information loss. To
validate our claim, we introduce a set of embarrassingly simple one-layer
linear models named LTSF-Linear for comparison. Experimental results on nine
real-life datasets show that LTSF-Linear surprisingly outperforms existing
sophisticated Transformer-based LTSF models in all cases, and often by a large
margin. Moreover, we conduct comprehensive empirical studies to explore the
impacts of various design elements of LTSF models on their temporal relation
extraction capability. We hope this surprising finding opens up new research
directions for the LTSF task. We also advocate revisiting the validity of
Transformer-based solutions for other time series analysis tasks (e.g., anomaly
detection) in the future. Code is available at:
https://github.com/cure-lab/LTSF-Linear.
@misc{zeng2022transformers,
abstract = {Recently, there has been a surge of Transformer-based solutions for the
long-term time series forecasting (LTSF) task. Despite the growing performance
over the past few years, we question the validity of this line of research in
this work. Specifically, Transformers is arguably the most successful solution
to extract the semantic correlations among the elements in a long sequence.
However, in time series modeling, we are to extract the temporal relations in
an ordered set of continuous points. While employing positional encoding and
using tokens to embed sub-series in Transformers facilitate preserving some
ordering information, the nature of the \emph{permutation-invariant}
self-attention mechanism inevitably results in temporal information loss. To
validate our claim, we introduce a set of embarrassingly simple one-layer
linear models named LTSF-Linear for comparison. Experimental results on nine
real-life datasets show that LTSF-Linear surprisingly outperforms existing
sophisticated Transformer-based LTSF models in all cases, and often by a large
margin. Moreover, we conduct comprehensive empirical studies to explore the
impacts of various design elements of LTSF models on their temporal relation
extraction capability. We hope this surprising finding opens up new research
directions for the LTSF task. We also advocate revisiting the validity of
Transformer-based solutions for other time series analysis tasks (e.g., anomaly
detection) in the future. Code is available at:
\url{https://github.com/cure-lab/LTSF-Linear}.},
added-at = {2023-04-17T01:36:22.000+0200},
author = {Zeng, Ailing and Chen, Muxi and Zhang, Lei and Xu, Qiang},
biburl = {https://www.bibsonomy.org/bibtex/2737b9242558c5a755f3cf9133aa28d9a/qilinw},
description = {Are Transformers Effective for Time Series Forecasting?},
interhash = {4c213b232c66c29f1923998db53f2e36},
intrahash = {737b9242558c5a755f3cf9133aa28d9a},
keywords = {time_series transformer},
note = {cite arxiv:2205.13504Comment: Code is available at https://github.com/cure-lab/LTSF-Linear},
timestamp = {2023-04-17T01:36:22.000+0200},
title = {Are Transformers Effective for Time Series Forecasting?},
url = {http://arxiv.org/abs/2205.13504},
year = 2022
}