Datasets Submodule

LTSF Datasets

LTSF is a collection of time series forecasting datasets that are commonly used in benchmarking forecasting algorithms. Typically, the performance is reported as the mean squared error and mean absolute error over multiple forecasting horizons: 96, 192, 336, and 720 time steps.

class torchcast.datasets.ElectricityLoadDataset(path: str | None = None, split: str = 'all', download: str | bool = True, scale: bool = True, columns_as_channels: bool = True, transform: Callable | None = None, input_margin: int | None = 336, return_length: int | None = None)

Electricity Load dataset, obtained from:

This is derived from:

But the data has been subsetted and pre-processed. It is sometimes abbreviated as the ECL dataset.

class torchcast.datasets.ElectricityTransformerDataset(path: str | None = None, task: str = '15min', split: str = 'all', download: bool | str = True, scale: bool = True, columns_as_channels: bool = True, transform: Callable | None = None, input_margin: int | None = 336, return_length: int | None = None)

This is the Zhou et al. electricity transformer dataset, obtained from:

This is sometimes abbreviated as the ETT dataset.

class torchcast.datasets.ExchangeRateDataset(path: str | None = None, split: str = 'all', download: bool | str = True, scale: bool = True, columns_as_channels: bool = True, transform: Callable | None = None, input_margin: int | None = None, return_length: int | None = None)

This is a record of currency exchange rates, taken from:

class torchcast.datasets.GermanWeatherDataset(path: str | None = None, year: int | Iterable[int] = 2020, site: str | Iterable[str] = 'beutenberg', split: str = 'all', download: bool | str = True, scale: bool = True, columns_as_channels: bool = True, transform: Callable | None = None, input_margin: int | None = 336, return_length: int | None = None)

This is a dataset of weather data from Germany, obtained from:

This is provided because it was used in the paper:

Which used only the data from Beutenberg in 2020.

class torchcast.datasets.ILIDataset(path: str, split: str = 'all', scale: bool = True, columns_as_channels: bool = True, transform: Callable | None = None, input_margin: int | None = 336, return_length: int | None = None)

This dataset describes both the raw number of patients with influenza-like symptoms and the ratio of those patients to the total number of patients in the US, obtained from the CDC. This must be manually downloaded from:

To download this dataset, click “Download Data”. Unselect “WHO/NREVSS” and select the desired seasons, then click “Download Data”.

class torchcast.datasets.SanFranciscoTrafficDataset(path: str | None = None, split: str = 'all', download: str | bool = True, scale: bool = True, columns_as_channels: bool = True, transform: Callable | None = None, input_margin: int | None = None, return_length: int | None = None)

San Francisco traffic dataset, taken from:

Monash Archive Datasets

The Monash archive is a collection of time series forecasting datasets in a standard format.

class torchcast.datasets.MonashArchiveDataset(task: str, split: str = 'train', path: str | None = None, download: str | bool = True, transform: Callable | None = None, return_length: int | None = None)

This provides access to all Monash forecasting archive datasets:

https://forecastingdata.org

Godahewa et al., 2021. “Monash Time Series Forecasting Archive.” Neural Information Processing Systems 2021.

UCR/UEA Datasets

The UCR/UEA archive is a collection of time series classification datasets in a standard format. The UCR archive provides univariate time series, while the UEA archive provides multivariate time series.

class torchcast.datasets.UCRDataset(task: str, split: str = 'train', path: str | None = None, download: bool | str = True, transform: Callable | None = None, return_length: int | None = None)

This is the UCR dataset for univariate time series classification, found at:

class torchcast.datasets.UEADataset(task: str, split: str = 'train', path: str | None = None, download: bool | str = True, transform: Callable | None = None, return_length: int | None = None)

This is the UEA dataset for multivariate time series classification, found at:

Other Datasets

class torchcast.datasets.AirQualityDataset(path: str | None = None, download: bool | str = True, transform: Callable | None = None, return_length: int | None = None)

This is the De Vito et al. air quality dataset.