tsgm.utils.datasets
¶
Module Contents¶
- gen_sine_dataset(N: int, T: int, D: int, max_value: int = 10) numpy.typing.NDArray [source]¶
Generates a dataset of sinusoidal waves with random parameters.
- Parameters:
- Returns:
Generated dataset with shape (N, T, D).
- Return type:
- gen_sine_const_switch_dataset(N: int, T: int, D: int, max_value: int = 10, const: int = 0, frequency_switch: float = 0.1) gen_sine_const_switch_dataset.T[tensorflow.python.types.core.TensorLike, tensorflow.python.types.core.TensorLike] [source]¶
Generates a dataset with alternating constant and sinusoidal sequences.
- Parameters:
N (int) – Number of samples in the dataset.
T (int) – Length of each sequence in the dataset.
D (int) – Number of dimensions in each sequence.
max_value (int, optional) – Maximum value for amplitude and shift of the sinusoids. Defaults to 10.
const (int, optional) – Value indicating whether the sequence is constant or sinusoidal. Defaults to 0.
frequency_switch (float, optional) – Probability of switching between constant and sinusoidal sequences. Defaults to 0.1.
- Returns:
Tuple containing input data (X) and target labels (y).
- Return type:
- gen_sine_vs_const_dataset(N: int, T: int, D: int, max_value: int = 10, const: int = 0) gen_sine_vs_const_dataset.T[tensorflow.python.types.core.TensorLike, tensorflow.python.types.core.TensorLike] [source]¶
Generates a dataset with alternating sinusoidal and constant sequences.
- Parameters:
N (int) – Number of samples in the dataset.
T (int) – Length of each sequence in the dataset.
D (int) – Number of dimensions in each sequence.
max_value (int, optional) – Maximum value for amplitude and shift of the sinusoids. Defaults to 10.
const (int, optional) – Maximum value for the constant sequence. Defaults to 0.
- Returns:
Tuple containing input data (X) and target labels (y).
- Return type:
- class UCRDataManager(path: str = default_path, ds: str = 'gunpoint')[source]¶
A manager for UCR collection of time series datasets. If you find these datasets useful, please cite: @misc{UCRArchive2018,
title = {The UCR Time Series Classification Archive}, author = {Dau, Hoang Anh and Keogh, Eamonn and Kamgar, Kaveh and Yeh, Chin-Chia Michael and Zhu, Yan
and Gharghabi, Shaghayegh and Ratanamahatana, Chotirat Ann and Yanping and Hu, Bing and Begum, Nurjahan and Bagnall, Anthony and Mueen, Abdullah and Batista, Gustavo, and Hexagon-ML},
year = {2018}, month = {October}, note = {url{https://www.cs.ucr.edu/~eamonn/time_series_data_2018/}}
}
- Parameters:
path (str) – a relative path to the stored UCR dataset.
ds (str) – Name of the dataset. The list of names is available at https://www.cs.ucr.edu/~eamonn/time_series_data_2018/ (case sensitive!).
- Raises:
ValueError – When there is no stored UCR archive, or the name of the dataset is incorrect.
- get() Tuple[tensorflow.python.types.core.TensorLike, tensorflow.python.types.core.TensorLike, tensorflow.python.types.core.TensorLike, tensorflow.python.types.core.TensorLike] [source]¶
Returns a tuple containing training and testing data.
- Returns:
A tuple (X_train, y_train, X_test, y_test).
- Return type:
tuple[TensorLike, TensorLike, TensorLike, TensorLike]
- get_mauna_loa() Tuple[tensorflow.python.types.core.TensorLike, tensorflow.python.types.core.TensorLike] [source]¶
Loads the Mauna Loa CO2 dataset.
This function loads the Mauna Loa CO2 dataset, which contains measurements of atmospheric CO2 concentrations at the Mauna Loa Observatory in Hawaii.
- Returns:
A tuple containing the input data (X) and target labels (y).
- Return type:
tuple[TensorLike, TensorLike]
- split_dataset_into_objects(X: tensorflow.python.types.core.TensorLike, y: tensorflow.python.types.core.TensorLike, step: int = 10) Tuple[tensorflow.python.types.core.TensorLike, tensorflow.python.types.core.TensorLike] [source]¶
Splits the dataset into objects of fixed length.
This function splits the input dataset into objects of fixed length along the first dimension, 0-padding if necessary.
- load_arff(path: str) pandas.DataFrame [source]¶
Loads data from an ARFF (Attribute-Relation File Format) file.
This function reads data from an ARFF file located at the specified path and returns it as a pandas DataFrame.
- Parameters:
path (str) – Path to the ARFF file.
- Returns:
DataFrame containing the loaded data.
- Return type:
pandas.DataFrame
- get_eeg() Tuple[tensorflow.python.types.core.TensorLike, tensorflow.python.types.core.TensorLike] [source]¶
Loads the EEG Eye State dataset.
This function downloads the EEG Eye State dataset from the UCI Machine Learning Repository and returns the input features (X) and target labels (y).
- Returns:
A tuple containing the input features (X) and target labels (y).
- Return type:
tuple[TensorLike, TensorLike]
- get_synchronized_brainwave_dataset() Tuple[pandas.DataFrame, pandas.DataFrame] [source]¶
Loads the EEG Synchronized Brainwave dataset.
This function downloads the EEG Synchronized Brainwave dataset from dropbox and returns the input features (X) and target labels (y).
- Returns:
A tuple containing the input features (X) and target labels (y).
- Return type:
tuple[pd.DataFrame, pd.DataFrame]
- get_power_consumption() numpy.typing.NDArray [source]¶
Retrieves the household power consumption dataset.
This function downloads and loads the household power consumption dataset from the UCI Machine Learning Repository. It returns the dataset as a NumPy array.
- Returns:
Household power consumption dataset.
- Return type:
- get_stock_data(stock_name: str) numpy.typing.NDArray [source]¶
Downloads historical stock data for the specified stock ticker.
This function downloads historical stock data for the specified stock ticker using the Yahoo Finance API. It returns the stock data as a NumPy array with an additional axis representing the batch dimension.
- Parameters:
stock_name (str) – Ticker symbol of the stock.
- Returns:
Historical stock data.
- Return type:
- Raises:
ValueError – If the provided stock ticker is invalid or no data is available.
- get_energy_data() numpy.typing.NDArray [source]¶
Retrieves the energy consumption dataset.
This function downloads and loads the energy consumption dataset from the UCI Machine Learning Repository. It returns the dataset as a NumPy array.
- Returns:
Energy consumption dataset.
- Return type:
- get_mnist_data() Tuple[tensorflow.python.types.core.TensorLike, tensorflow.python.types.core.TensorLike, tensorflow.python.types.core.TensorLike, tensorflow.python.types.core.TensorLike] [source]¶
Retrieves the MNIST dataset.
This function loads the MNIST dataset, which consists of 28x28 grayscale images of handwritten digits, and returns the training and testing data along with their corresponding labels.
- Returns:
A tuple containing the training data, training labels, testing data, and testing labels.
- Return type:
tuple[TensorLike, TensorLike, TensorLike, TensorLike]
- _exponential_quadratic(x: numpy.typing.NDArray, y: numpy.typing.NDArray) float [source]¶
This function calculates the exponential quadratic kernel matrix between two sets of points, given by matrices
x
andy
.- Parameters:
x (numpy.ndarray) – First set of points.
y (numpy.ndarray) – Second set of points.
- Returns:
Exponential quadratic kernel matrix.
- Return type:
- get_gp_samples_data(num_samples: int, max_time: int, covar_func: Callable = _exponential_quadratic) numpy.typing.NDArray [source]¶
Generates samples from a Gaussian process.
This function generates samples from a Gaussian process using the specified covariance function. It returns the generated samples as a NumPy array.
- Parameters:
num_samples (int) – Number of samples to generate.
max_time (int) – Maximum time value for the samples.
covar_func (Callable, optional) – Covariance function to use. Defaults to
_exponential_quadratic
.
- Returns:
Generated samples from the Gaussian process.
- Return type:
- get_physionet2012() Tuple[tensorflow.python.types.core.TensorLike, tensorflow.python.types.core.TensorLike, tensorflow.python.types.core.TensorLike, tensorflow.python.types.core.TensorLike, tensorflow.python.types.core.TensorLike, tensorflow.python.types.core.TensorLike] [source]¶
Retrieves the Physionet 2012 dataset.
This function downloads and retrieves the Physionet 2012 dataset, which consists of physiological data and corresponding outcomes. It returns the training, testing, and validation datasets along with their labels.
- Returns:
A tuple containing the training, testing, and validation datasets along with their labels. (train_X, train_y, test_X, test_y, val_X, val_y)
- Return type:
tuple[TensorLike, TensorLike, TensorLike, TensorLike, TensorLike, TensorLike]
- download_physionet2012() None [source]¶
Downloads the Physionet 2012 dataset files from the Physionet website and extracts them in local folder ‘physionet2012’
- _get_physionet_X_dataframe(dataset_path: str) pandas.DataFrame [source]¶
Reads txt files from folder ‘dataset_path’ and returns a dataframe (X) with the Physionet dataset.
- Args:
dataset_path (str): Path to the dataset folder.
- Returns:
pd.DataFrame: The features (X) dataframe.
- _get_physionet_y_dataframe(file_path: str) pandas.DataFrame [source]¶
Reads txt files from folder ‘dataset_path’ and returns a dataframe (y) with the Physionet data.
- Args:
dataset_path (str): Path to the dataset folder.
- Returns:
pd.DataFrame: The target (y) dataframe.
- get_covid_19() Tuple[tensorflow.python.types.core.TensorLike, Tuple, List] [source]¶
Loads Covid-19 dataset with additional graph information The dataset is based on data from The New York Times, based on reports from state and local health agencies [1].
And was adapted to graph case in [2]. [1] The New York Times. (2021). Coronavirus (Covid-19) Data in the United States. Retrieved [Insert Date Here], from https://github.com/nytimes/covid-19-data. [2] Alexander V. Nikitin, St John, Arno Solin, Samuel Kaski Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:10640-10660, 2022.
Returns:¶
- tuple
First element is time series data (n_nodes x n_timestamps x n_features). Each timestamp consists of the number of deaths, cases, deaths normalized by the population, and cases normalized by the population. The second element is the graph tuple (nodes, edges). The third element is the order of states.