TSGM

Datasets 

class UCRDataManager(path: str | None = None, ds: str = 'gunpoint')[source]

A manager for UCR collection of time series datasets.

If you find these datasets useful, please cite:

Dau, Hoang Anh, Eamonn Keogh, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana, Yanping Chen, Bing Hu, Nurjahan Begum, Anthony Bagnall, Abdullah Mueen and Gustavo Batista (2018). “The UCR Time Series Classification Archive.” https://www.cs.ucr.edu/~eamonn/time_series_data_2018/

Parameters:

path (str or None) – a relative path to the stored UCR dataset. If None, uses the default data directory.
ds (str) – Name of the dataset. The list of names is available at https://www.cs.ucr.edu/~eamonn/time_series_data_2018/ (case sensitive!).

Raises:

ValueError – When there is no stored UCR archive, or the name of the dataset is incorrect.

default_path = '/home/docs/checkouts/readthedocs.org/user_builds/tsgm/checkouts/latest/tsgm/utils/../../data'

get() → Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]

Returns a tuple containing training and testing data.

Returns:: A tuple (X_train, y_train, X_test, y_test).
Return type:: tuple[TensorLike, TensorLike, TensorLike, TensorLike]

get_classes_distribution() → Dict[source]

Returns a dictionary with the fraction of occurrences for each class.

Returns:: A dictionary containing the fraction of occurrences for each class.
Return type:: dict[Any, float]

key = 'someone'

mirrors = ['https://www.cs.ucr.edu/~eamonn/time_series_data_2018/']

resources = [('UCRArchive_2018.zip', 0)]

summary() → None[source]: Prints a summary of the dataset.

class UCRDataManager(path: str | None = None, ds: str = 'gunpoint')[source]

A manager for UCR collection of time series datasets.

If you find these datasets useful, please cite:

Dau, Hoang Anh, Eamonn Keogh, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana, Yanping Chen, Bing Hu, Nurjahan Begum, Anthony Bagnall, Abdullah Mueen and Gustavo Batista (2018). “The UCR Time Series Classification Archive.” https://www.cs.ucr.edu/~eamonn/time_series_data_2018/

Parameters:

path (str or None) – a relative path to the stored UCR dataset. If None, uses the default data directory.
ds (str) – Name of the dataset. The list of names is available at https://www.cs.ucr.edu/~eamonn/time_series_data_2018/ (case sensitive!).

Raises:

ValueError – When there is no stored UCR archive, or the name of the dataset is incorrect.

default_path = '/home/docs/checkouts/readthedocs.org/user_builds/tsgm/checkouts/latest/tsgm/utils/../../data'

get() → Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]

Returns a tuple containing training and testing data.

Returns:: A tuple (X_train, y_train, X_test, y_test).
Return type:: tuple[TensorLike, TensorLike, TensorLike, TensorLike]

get_classes_distribution() → Dict[source]

Returns a dictionary with the fraction of occurrences for each class.

Returns:: A dictionary containing the fraction of occurrences for each class.
Return type:: dict[Any, float]

key = 'someone'

mirrors = ['https://www.cs.ucr.edu/~eamonn/time_series_data_2018/']

resources = [('UCRArchive_2018.zip', 0)]

summary() → None[source]: Prints a summary of the dataset.

y_all: Collection[Hashable] | None

download_physionet2012() → None[source]: Downloads the Physionet 2012 dataset files from the Physionet website and extracts them in local folder ‘physionet2012’

gen_sine_const_switch_dataset(N: int, T: int, D: int, max_value: int = 10, const: int = 0, frequency_switch: float = 0.1) → Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]

Generates a dataset with alternating constant and sinusoidal sequences.

Parameters:

N (int) – Number of samples in the dataset.
T (int) – Length of each sequence in the dataset.
D (int) – Number of dimensions in each sequence.
max_value (int, optional) – Maximum value for amplitude and shift of the sinusoids. Defaults to 10.
const (int, optional) – Value indicating whether the sequence is constant or sinusoidal. Defaults to 0.
frequency_switch (float, optional) – Probability of switching between constant and sinusoidal sequences. Defaults to 0.1.

Returns:

Tuple containing input data (X) and target labels (y).

Return type:

tuple[numpy.ndarray, numpy.ndarray]

gen_sine_dataset(N: int, T: int, D: int, max_value: int = 10) → ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]

Generates a dataset of sinusoidal waves with random parameters.

Parameters:

N (int) – Number of samples in the dataset.
T (int) – Length of each time series in the dataset.
D (int) – Number of dimensions (sinusoids) in each time series.
max_value (int, optional) – Maximum value for amplitude and shift of the sinusoids. Defaults to 10.

Returns:

Generated dataset with shape (N, T, D).

Return type:

numpy.ndarray

gen_sine_vs_const_dataset(N: int, T: int, D: int, max_value: int = 10, const: int = 0) → Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]

Generates a dataset with alternating sinusoidal and constant sequences.

Parameters:

N (int) – Number of samples in the dataset.
T (int) – Length of each sequence in the dataset.
D (int) – Number of dimensions in each sequence.
max_value (int, optional) – Maximum value for amplitude and shift of the sinusoids. Defaults to 10.
const (int, optional) – Maximum value for the constant sequence. Defaults to 0.

Returns:

Tuple containing input data (X) and target labels (y).

Return type:

tuple[numpy.ndarray, numpy.ndarray]

get_covid_19() → Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], Tuple, List][source]

Loads Covid-19 dataset with additional graph information.

The dataset is based on data from The New York Times, based on reports from state and local health agencies:

The New York Times (2021). “Coronavirus (Covid-19) Data in the United States.” https://github.com/nytimes/covid-19-data

Adapted to the graph case in:

Alexander V. Nikitin, ST John, Arno Solin, Samuel Kaski. “Non-separable spatio-temporal graph kernels via SPDEs.” Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:10640-10660, 2022.

Returns:: A tuple (data, graph, states) where data has shape (n_nodes, n_timestamps, n_features) with features being deaths, cases, deaths normalized by population, and cases normalized by population; graph is a tuple (nodes, edges); and states is the list of state names.
Return type:: tuple[TensorLike, tuple, list]

get_eeg() → Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]

Loads the EEG Eye State dataset.

This function downloads the EEG Eye State dataset from the UCI Machine Learning Repository and returns the input features (X) and target labels (y).

Returns:: A tuple containing the input features (X) and target labels (y).
Return type:: tuple[TensorLike, TensorLike]

get_energy_data() → ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]

Retrieves the energy consumption dataset.

This function downloads and loads the energy consumption dataset from the UCI Machine Learning Repository. It returns the dataset as a NumPy array.

Returns:: Energy consumption dataset.
Return type:: numpy.ndarray

get_gp_samples_data(num_samples: int, max_time: int, covar_func: Callable | None = None) → ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]

Generates samples from a Gaussian process.

This function generates samples from a Gaussian process using the specified covariance function. It returns the generated samples as a NumPy array.

Parameters:

num_samples (int) – Number of samples to generate.
max_time (int) – Maximum time value for the samples.
covar_func (Callable or None, optional) – Covariance function to use. Defaults to _exponential_quadratic.

Returns:

Generated samples from the Gaussian process.

Return type:

numpy.ndarray

get_mauna_loa() → Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]

Loads the Mauna Loa CO2 dataset.

This function loads the Mauna Loa CO2 dataset, which contains measurements of atmospheric CO2 concentrations at the Mauna Loa Observatory in Hawaii.

Returns:: A tuple containing the input data (X) and target labels (y).
Return type:: tuple[TensorLike, TensorLike]

get_mnist_data() → Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]

Retrieves the MNIST dataset.

This function loads the MNIST dataset, which consists of 28x28 grayscale images of handwritten digits, and returns the training and testing data along with their corresponding labels.

Returns:: A tuple containing the training data, training labels, testing data, and testing labels.
Return type:: tuple[TensorLike, TensorLike, TensorLike, TensorLike]

get_physionet2012() → Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]

Retrieves the Physionet 2012 dataset.

This function downloads and retrieves the Physionet 2012 dataset, which consists of physiological data and corresponding outcomes. It returns the training, testing, and validation datasets along with their labels.

Returns:: A tuple containing the training, testing, and validation datasets along with their labels. (train_X, train_y, test_X, test_y, val_X, val_y)
Return type:: tuple[TensorLike, TensorLike, TensorLike, TensorLike, TensorLike, TensorLike]

get_power_consumption() → ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]

Retrieves the household power consumption dataset.

This function downloads and loads the household power consumption dataset from the UCI Machine Learning Repository. It returns the dataset as a NumPy array.

Returns:: Household power consumption dataset.
Return type:: numpy.ndarray

get_stock_data(stock_name: str) → ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]

Downloads historical stock data for the specified stock ticker.

This function downloads historical stock data for the specified stock ticker using the Yahoo Finance API. It returns the stock data as a NumPy array with an additional axis representing the batch dimension.

Parameters:: stock_name (str) – Ticker symbol of the stock.
Returns:: Historical stock data.
Return type:: numpy.ndarray
Raises:: ValueError – If the provided stock ticker is invalid or no data is available.

get_synchronized_brainwave_dataset() → Tuple[DataFrame, DataFrame][source]

Loads the EEG Synchronized Brainwave dataset.

This function downloads the EEG Synchronized Brainwave dataset from dropbox and returns the input features (X) and target labels (y).

Returns:: A tuple containing the input features (X) and target labels (y).
Return type:: tuple[pd.DataFrame, pd.DataFrame]

load_arff(path: str) → DataFrame[source]

Loads data from an ARFF (Attribute-Relation File Format) file.

This function reads data from an ARFF file located at the specified path and returns it as a pandas DataFrame.

Parameters:: path (str) – Path to the ARFF file.
Returns:: DataFrame containing the loaded data.
Return type:: pandas.DataFrame

split_dataset_into_objects(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], y: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], step: int = 10) → Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]

Splits the dataset into objects of fixed length.

This function splits the input dataset into objects of fixed length along the first dimension, 0-padding if necessary.

Parameters:

X (TensorLike) – Input data.
y (TensorLike) – Target labels.
step (int, optional) – Length of each object. Defaults to 10.

Returns:

A tuple containing input data objects and corresponding target label objects.

Return type:

tuple[TensorLike, TensorLike]

Augmentations 

class BaseAugmenter(per_feature: bool)[source]

generate(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], y: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]] | None = None, n_samples: int = 1) → jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]] | Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]

class BaseCompose(augmentations: List[BaseAugmenter])[source]

class DTWBarycentricAveraging[source]

DTW Barycenter Averaging (DBA) [1] method estimated through: Expectation-Maximization algorithm [2] as in https://github.com/tslearn-team/tslearn/

References

generate(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], y: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]] | None = None, n_samples: int = 1, num_initial_samples: int | None = None, initial_timeseries: List[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]] | None = None, initial_labels: List[int] | None = None, **kwargs) → jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]] | Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]

Parameters:

X (TensorLike) – The timeseries dataset.
y (TensorLike or None) – The classes, or None.
n_samples (int) – Number of samples to generate (per class, if y is given).
num_initial_samples (int or None) – The number of timeseries to draw (per class) from the dataset before computing DTW_BA. If None, use the entire set (per class).
initial_timeseries (array or None) – Initial timeseries to start from for the optimization process, with shape (original_size, d). In case y is given, the shape of initial_timeseries is assumed to be (n_classes, original_size, d).
initial_labels (array or None) – Labels for samples from initial_timeseries.

Returns:

np.array of shape (n_samples, original_size, d) if y is None or (n_classes * n_samples, original_size, d), and np.array of labels (or None).

Return type:

tuple

class GaussianNoise(per_feature: bool = True)[source]

Apply noise to the input time series.

Parameters:

variance (float or tuple(float, float)) – Variance range for noise. If var_limit is a single float, the range will be (0, var_limit). Default: (10.0, 50.0).
mean (float) – Mean of the noise. Default: 0.
per_feature (bool) – If set to True, noise will be sampled for each feature independently. Otherwise, the noise will be sampled once for all features. Default: True.

generate(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], y: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]] | None = None, n_samples: int = 1, mean: float = 0, variance: float = 1.0) → jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]] | Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]

Generate synthetic data with Gaussian noise.

Parameters:

X (TensorLike) – Input data tensor of shape (n_data, n_timesteps, n_features).
y (Optional[TensorLike]) – Optional labels tensor. If provided, labels will also be returned
n_samples (int) – Number of augmented samples to generate. Default is 1.
mean (float) – The mean of the noise. Default is 0.
variance (float) – The variance of the noise. Default is 1.0.

Returns:

Augmented data tensor of shape (n_samples, n_timesteps, n_features) and optionally augmented labels if ‘y’ is provided.

Return type:

Union[TensorLike, Tuple[TensorLike, TensorLike]]

class MagnitudeWarping[source]

Magnitude warping changes the magnitude of each sample by convolving the data window with a smooth curve varying around one https://dl.acm.org/doi/pdf/10.1145/3136755.3136817

generate(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], y: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]] | None = None, n_samples: int = 1, sigma: float = 0.2, n_knots: int = 4) → jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]] | Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]

Generates augmented samples via MagnitudeWarping for (X, y)

Parameters:

X (TensorLike) – Input data tensor of shape (n_data, n_timesteps, n_features).
y (Optional[TensorLike]) – Optional labels tensor. If provided, labels will also be returned
n_samples (int) – Number of augmented samples to generate. Default is 1.
sigma (float) – Standard deviation for the random warping. Default is 0.2.
n_knots (int) – Number of knots used for warping curve. Default is 4.

Returns:

Augmented data tensor of shape (n_samples, n_timesteps, n_features) and optionally augmented labels if ‘y’ is provided.

Return type:

Union[TensorLike, Tuple[TensorLike, TensorLike]]

class Shuffle[source]

Shuffles time series features. Shuffling is beneficial when each feature corresponds to interchangeable sensors.

generate(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], y: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]] | None = None, n_samples: int = 1) → jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]] | Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]

Generate synthetic data using Shuffle strategy. Features are randomly shuffled to generate novel samples.

Parameters:

X (TensorLike) – Input data tensor of shape (n_data, n_timesteps, n_features).
y (Optional[TensorLike]) – Optional labels tensor. If provided, labels will also be returned
n_samples (int) – Number of augmented samples to generate. Default is 1.

Returns:

Augmented data tensor of shape (n_samples, n_timesteps, n_features) and optionally augmented labels if ‘y’ is provided.

Return type:

Union[TensorLike, Tuple[TensorLike, TensorLike]]

class SliceAndShuffle(per_feature: bool = False)[source]

Slice the time series in k pieces and create a new time series by shuffling.

Parameters:: per_feature (bool) – If set to True, each time series is sliced independently. Otherwise, all features are sliced in the same way. Default: True.

generate(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], y: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]] | None = None, n_samples: int = 1, n_segments: int = 2) → jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]] | Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]

Generate synthetic data using Slice-And-Shuffle strategy. Slices are randomly selected.

Parameters:

X (TensorLike) – Input data tensor of shape (n_data, n_timesteps, n_features).
y (Optional[TensorLike]) – Optional labels tensor. If provided, labels will also be returned
n_segments (int) – The number of slices, default is 2.
n_samples (int) – Number of augmented samples to generate. Default is 1.

Returns:

Augmented data tensor of shape (n_samples, n_timesteps, n_features) and optionally augmented labels if ‘y’ is provided.

Return type:

Union[TensorLike, Tuple[TensorLike, TensorLike]]

class WindowWarping[source]

https://halshs.archives-ouvertes.fr/halshs-01357973/document

generate(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], y: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]] | None = None, window_ratio: float = 0.2, scales: Tuple = (0.25, 1.0), n_samples: int = 1) → jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]] | Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]

Generates augmented samples via WindowWarping for (X, y)

Parameters:

X (TensorLike) – Input data tensor of shape (n_data, n_timesteps, n_features).
y (Optional[TensorLike]) – Optional labels tensor. If provided, labels will also be returned
window_ratio (float) – The ratio of the window size relative to the total number of timesteps. Default is 0.2.
scales (tuple) – A tuple specifying the scale range for warping. Default is (0.25, 1.0).
n_samples (int) – Number of augmented samples to generate. Default is 1.

Returns:

Augmented data tensor of shape (n_samples, n_timesteps, n_features) and optionally augmented labels if ‘y’ is provided.

Return type:

Union[TensorLike, Tuple[TensorLike, TensorLike]]

Metrics 

class BaseDownstreamEvaluator[source]

evaluate(*args, **kwargs)[source]

class ConsistencyMetric(evaluators: List)[source]

Predictive consistency metric measures whether a set of evaluators yield consistent results on real and synthetic data.

Parameters:: evaluators (list) – A list of evaluators (each item should implement method .evaluate(D))

class DemographicParityMetric[source]

Measuring demographic parity between two datasets.

This metric assesses the difference in the distributions of a target variable among different groups in two datasets. By default, it uses the Kolmogorov-Smirnov statistic to quantify the maximum vertical deviation between the cumulative distribution functions of the target variable for the historical and synthetic data within each group.

Example:

>>> metric = DemographicParityMetric()
>>> dataset_hist = tsgm.dataset.Dataset(...)
>>> dataset_synth = tsgm.dataset.Dataset(...)
>>> groups_hist = [0, 1, 0, 1, 1, 0]
>>> groups_synth = [1, 1, 0, 0, 0, 1]
>>> result = metric(dataset_hist, groups_hist, dataset_synth, groups_synth)
>>> print(result)

class DiscriminativeMetric[source]

The DiscriminativeMetric measures the discriminative performance of a model in distinguishing between synthetic and real datasets.

This metric evaluates a discriminative model by training it on a combination of synthetic and real datasets and assessing its performance on a test set.

Parameters:

d_hist (tsgm.dataset.DatasetOrTensor) – Real dataset.
d_syn (tsgm.dataset.DatasetOrTensor) – Synthetic dataset.
model (T.Callable) – Discriminative model to be evaluated.
test_size (T.Union[float, int]) – Proportion of the dataset to include in the test split or the absolute number of test samples.
n_epochs (int) – Number of training epochs for the model.
metric (T.Optional[T.Callable]) – Optional evaluation metric to use (default: accuracy).
random_seed (T.Optional[int]) – Optional random seed for reproducibility.

Returns:

Discriminative performance metric.

Return type:

float

Example:

>>> from my_module import DiscriminativeMetric, MyDiscriminativeModel
>>> import tsgm.dataset
>>> import numpy as np
>>> import sklearn
>>>
>>> # Create real and synthetic datasets
>>> real_dataset = tsgm.dataset.Dataset(...)  # Replace ... with appropriate arguments
>>> synthetic_dataset = tsgm.dataset.Dataset(...)  # Replace ... with appropriate arguments
>>>
>>> # Create a discriminative model
>>> model = MyDiscriminativeModel()  # Replace with the actual discriminative model class
>>>
>>> # Create and use the DiscriminativeMetric
>>> metric = DiscriminativeMetric()
>>> result = metric(real_dataset, synthetic_dataset, model, test_size=0.2, n_epochs=10)
>>> print(result)

class DistanceMetric(statistics: list, discrepancy: Callable)[source]

Metric that measures similarity between synthetic and real time series

Parameters:

statistics (list) – A list of summary statistics (callable)
discrepancy (Callable) – Discrepancy function, measures the distance between the vectors of summary statistics.

discrepancy(stats1: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], stats2: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → float[source]

Parameters:

stats1 (tsgm.types.Tensor.) – A vector of summary statistics.
stats2 (tsgm.types.Tensor.) – A vector of summary statistics.

Returns:

the distance between two vectors calculated by self._discrepancy.

stats(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]

Parameters:: X (tsgm.types.Tensor.) – A time series dataset.
Returns:: a tensor with calculated summary statistics.

class DownstreamPerformanceMetric(evaluator: BaseDownstreamEvaluator)[source]

The downstream performance metric evaluates the performance of a model on a downstream task. It returns performance gains achieved with the addition of synthetic data.

Parameters:: evaluator (BaseDownstreamEvaluator) – An evaluator, should implement method .evaluate(D)

class EntropyMetric[source]

Calculates the spectral entropy of a dataset or tensor as a sum of individual entropies.

Example:

>>> metric = EntropyMetric()
>>> dataset = tsgm.dataset.Dataset(...)
>>> result = metric(dataset)
>>> print(result)

class MMDMetric(kernel: ~typing.Callable = <function exp_quad_kernel>)[source]

This metric calculates MMD between real and synthetic samples.

Example:

>>> metric = MMDMetric(kernel)
>>> dataset, synth_dataset = tsgm.dataset.Dataset(...), tsgm.dataset.Dataset(...)
>>> result = metric(dataset)
>>> print(result)

class Metric[source]

class PairwiseDistanceMetric[source]

Measures pairwise distances in a set of time series.

pairwise_euclidean_distances(ts: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]

Computes the pairwise Euclidean distances for a set of time series.

Parameters:: ts (numpy.ndarray) – A 2D array where each row represents a time series.
Returns:: A 2D array representing the pairwise Euclidean distance matrix.
Return type:: numpy.ndarray

class PredictiveParityMetric[source]

Measuring predictive parity between two datasets.

This metric assesses the discrepancy in the predictive performance of a model among different groups in two datasets. By default, it uses precision to quantify the predictive performance of the model within each group.

Example:

>>> metric = PredictiveParityMetric()
>>> y_true_hist = [0, 1, 0, 1, 1, 0]
>>> y_pred_hist = [0, 1, 0, 0, 1, 1]
>>> groups_hist = [0, 1, 0, 1, 1, 0]
>>> y_true_synth = [1, 0, 1, 0, 0, 1]
>>> y_pred_synth = [1, 0, 1, 1, 0, 0]
>>> groups_synth = [1, 1, 0, 0, 0, 1]
>>> result = metric(y_true_hist, y_pred_hist, groups_hist, y_true_synth, y_pred_synth, groups_synth)
>>> print(result)

class PrivacyMembershipInferenceMetric(attacker: Any, metric: Callable | None = None)[source]

The metric measures the possibility of membership inference attacks.

Parameters:

attacker (Any) – An attacker, one class classifier (OCC) that implements methods .fit and .predict
metric (Callable) – Measures quality of attacker (precision by default)

class ShannonEntropyMetric[source]: Shannon Entropy calculated over the labels of a dataset. This index is a measure of diversity that accounts for categories present in a dataset.

GANs 

class ConditionalGAN(*args: Any, **kwargs: Any)[source]

Conditional GAN implementation for labeled and temporally labeled time series.

Parameters:

discriminator (keras.Model) – A discriminator model which takes a time series as input and check whether the sample is real or fake.
generator (keras.Model) – Takes as input a random noise vector of latent_dim length and return a simulated time-series.
latent_dim (int) – The size of the noise vector.
temporal (bool) – Indicates whether the time series temporally labeled or not.
use_wgan (bool) – Use Wasserstein GAN with gradient penalty. Default is False.

call(inputs)[source]: Forward pass for the ConditionalGAN model. This method is required for Keras 3 compatibility with PyTorch backend.

compile(d_optimizer: keras.optimizers.Optimizer, g_optimizer: keras.optimizers.Optimizer, loss_fn: Callable) → None[source]

Compiles the generator and discriminator models.

Parameters:

d_optimizer (keras.optimizers.Optimizer) – An optimizer for the GAN’s discriminator.
g_optimizer (keras.optimizers.Optimizer) – An optimizer for the GAN’s generator.
loss_fn (keras.losses.Loss) – Loss function.

generate(labels: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]

Generates new data from the model.

Parameters:: labels (tsgm.types.Tensor) – The labels for which to generate samples.
Returns:: Generated samples.
Return type:: tsgm.types.Tensor

property metrics: List

Returns:: A list of metrics trackers (e.g., generator’s loss and discriminator’s loss).
Return type:: T.List

train_step(data: Tuple) → Dict[str, float][source]

Performs a training step using a batch of data, stored in data.

Parameters:: data (tsgm.types.Tensor) – A batch of data in a format batch_size x seq_len x feat_dim
Returns:: A dictionary with generator (key “g_loss”) and discriminator (key “d_loss”) losses
Return type:: T.Dict[str, float]

train_step_tf(tf, data: Tuple) → Dict[str, float][source]

train_step_torch(torch, data: Tuple) → Dict[str, float][source]

class GAN(*args: Any, **kwargs: Any)[source]

GAN implementation for unlabeled time series.

Parameters:

discriminator (keras.Model) – A discriminator model which takes a time series as input and check whether the sample is real or fake.
generator (keras.Model) – Takes as input a random noise vector of latent_dim length and returns a simulated time-series.
latent_dim (int) – The size of the noise vector.
use_wgan (bool) – Use Wasserstein GAN with gradient penalty.

call(inputs)[source]: Forward pass for the GAN model. This method is required for Keras 3 compatibility with PyTorch backend.

clone() → GAN[source]

Clones GAN object

Returns:: The exact copy of the object
Return type:: “GAN”

compile(d_optimizer: keras.optimizers.Optimizer, g_optimizer: keras.optimizers.Optimizer, loss_fn: keras.losses.Loss) → None[source]

Compiles the generator and discriminator models.

Parameters:

d_optimizer (keras.optimizers.Optimizer) – An optimizer for the GAN’s discriminator.
g_optimizer (keras.optimizers.Optimizer) – An optimizer for the GAN’s generator.
loss_fn (keras.losses.Loss) – Loss function.

generate(num: int) → jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]

Generates new data from the model.

Parameters:: num (int) – the number of samples to be generated.
Returns:: Generated samples
Return type:: tsgm.types.Tensor

gradient_penalty(batch_size, real_samples, fake_samples)[source]

gradient_penalty_tf(tf, interpolated)[source]

gradient_penalty_torch(torch, interpolated)[source]

property metrics: List

Returns:: A list of metrics trackers (e.g., generator’s loss and discriminator’s loss).

train_step(data: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → Dict[str, float][source]

Performs a training step using a batch of data, stored in data.

Parameters:: data (tsgm.types.Tensor) – A batch of data in a format batch_size x seq_len x feat_dim
Returns:: A dictionary with generator (key “g_loss”) and discriminator (key “d_loss”) losses
Return type:: T.Dict[str, float]

train_step_tf(tf, data: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → Dict[str, float][source]

train_step_torch(torch, data: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → Dict[str, float][source]

wgan_discriminator_loss(real_sample, fake_sample)[source]

wgan_generator_loss(fake_sample)[source]

VAEs 

class BetaVAE(*args: Any, **kwargs: Any)[source]

beta-VAE implementation for unlabeled time series.

Parameters:

encoder (keras.Model) – An encoder model which takes a time series as input.
decoder (keras.Model) – Takes as input a random noise vector and returns a simulated time-series.
beta (float) – The weight of the KL divergence term. Default is 1.0.

call(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]

Encodes and decodes time series dataset X.

Parameters:: X (tsgm.types.Tensor) – The input time series tensor.
Returns:: Generated samples
Return type:: tsgm.types.Tensor

generate(n: int) → jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]

Generates new data from the model.

Parameters:: n (int) – the number of samples to be generated.
Returns:: A tensor with generated samples.
Return type:: tsgm.types.Tensor

property metrics: List

Returns:: A list of metrics trackers (total loss, reconstruction loss, and KL loss).

train_step(data: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → Dict[source]

Performs a training step using a batch of data, stored in data.

Parameters:: data (tsgm.types.Tensor) – A batch of data in a format batch_size x seq_len x feat_dim
Returns:: A dict with losses
Return type:: T.Dict

train_step_jax(jax, data: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → Dict[source]

train_step_tf(tf, data: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → Dict[source]

train_step_torch(torch, data: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → Dict[source]

class cBetaVAE(*args: Any, **kwargs: Any)[source]

call(data: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]

Encodes and decodes time series dataset.

Parameters:: data (tsgm.types.Tensor) – The input data, either a tensor or a tuple of (X, labels).
Returns:: Generated samples.
Return type:: tsgm.types.Tensor

generate(labels: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]

Generates new data from the model.

Parameters:: labels (tsgm.types.Tensor) – The labels for which to generate conditional samples.
Returns:: A tuple of synthetically generated data and labels.
Return type:: T.Tuple[tsgm.types.Tensor, tsgm.types.Tensor]

property metrics: List: Returns the list of loss tracker: [loss, reconstruction_loss, kl_loss].

train_step(data: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → Dict[str, float][source]

Performs a training step using a batch of data, stored in data.

Parameters:: data (tsgm.types.Tensor) – A batch of data in a format batch_size x seq_len x feat_dim
Returns:: A dict with losses
Return type:: T.Dict[str, float]

train_step_jax(jax, data: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → Dict[str, float][source]

train_step_tf(tf, data: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → Dict[str, float][source]

train_step_torch(torch, data: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → Dict[str, float][source]

ABC 

class ABCAlgorithm[source]

A base class for ABC algorithms.

sample_parameters(n_samples: int) → List[source]

class RejectionSampler(simulator: ModelBasedSimulator, data: Dataset, statistics: List, epsilon: float, discrepancy: Callable, priors: Dict | None = None, **kwargs)[source]

Rejection sampling algorithm for approximate Bayesian computation.

Parameters:

simulator (class tsgm.simulator.ModelBasedSimulator) – A model based simulator
data (class tsgm.dataset.Dataset) – Historical dataset storage
statistics (list) – contains a list of summary statistics
epsilon (float) – tolerance of synthetically generated data to a set of summary statistics
discrepancy (Callable) – discrepancy measure function
priors (dict) – set of priors for each of the simulator parameters, defaults to DEFAULT_PRIOR

sample_parameters(n_samples: int) → List[source]

Samples parameters from the rejection sampler.

Parameters:: n_samples (int) – Number of samples.
Returns:: A list of samples. Each sample is represent as dict.
Return type:: T.List[T.Dict]

prior_samples(priors: Dict, params: List) → Dict[source]

Generate prior samples for the specified parameters.

Parameters:

priors (T.Dict) – A dictionary containing probability distributions for each parameter. Keys are parameter names, and values are instances of probability distribution classes. If a parameter is not present in the dictionary, a default prior distribution is used.
params (T.List) – A list of parameter names for which prior samples are to be generated.

Returns:

A dictionary where keys are parameter names and values are samples drawn from their respective prior distributions.

Return type:

T.Dict

Example:

priors = {'mean': NormalDistribution(0, 1), 'std_dev': UniformDistribution(0, 2)}
params = ['mean', 'std_dev']
samples = prior_samples(priors, params)

STS 

STS: alias of STSTensorFlow

class STSTensorFlow(model: tensorflow_probability.sts.StructuralTimeSeries | None = None)[source]

Class for training and generating from a structural time series model.

Initializes a new instance of the STS class.

Parameters:: model (tfp.sts.StructuralTimeSeriesModel or None) – Structural time series model to use. If None, default model is used.

elbo_loss() → float[source]

Returns the evidence lower bound (ELBO) loss from training.

Returns:: The value of the ELBO loss.
Return type:: float

generate(num_samples: int) → jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]

Generates samples from the trained model.

Parameters:: num_samples (int) – Number of samples to generate.
Returns:: Generated samples.
Return type:: tsgm.types.Tensor

train(ds: Dataset, num_variational_steps: int = 200, steps_forw: int = 10) → None[source]

Trains the structural time series model.

Parameters:

ds (tsgm.dataset.Dataset) – Dataset containing time series data.
num_variational_steps (int) – Number of variational optimization steps, defaults to 200.
steps_forw (int) – Number of steps to forecast, defaults to 10.

Visualization 

visualize_dataset(dataset: Dataset | jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], obj_id: int = 0, palette: dict = {'gen': 'blue', 'hist': 'red'}, path: str = '/tmp/generated_data.pdf') → None[source]

The function visualizes time series dataset with target values.

Parameters:: dataset (tsgm.dataset.DatasetOrTensor.) – A time series dataset.

visualize_original_and_reconst_ts(original: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], reconst: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], num: int = 5, vmin: int = 0, vmax: int = 1) → None[source]

Visualizes original and reconstructed time series data.

This function generates side-by-side visualizations of the original and reconstructed time series data. It randomly selects a specified number of samples from the input tensors original and reconst and displays them as images using imshow.

Parameters:

original (tsgm.types.Tensor) – Original time series data tensor.
reconst (tsgm.types.Tensor) – Reconstructed time series data tensor.
num (int, optional) – Number of samples to visualize, defaults to 5.
vmin (int, optional) – Minimum value for colormap normalization, defaults to 0.
vmax (int, optional) – Maximum value for colormap normalization, defaults to 1.

visualize_training_loss(loss_vector: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], labels: tuple = (), path: str = '/tmp/training_loss.pdf') → None[source]

Plot training losses as a function of the epochs

Parameters:

loss_vector (tsgm.types.Tensor) – Array having shape num_metrics x num_epochs.
labels (tuple) – List of label strings for each metric.
path (str) – Where to save the plot.

visualize_ts(ts: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], num: int = 5) → None[source]

Visualizes time series tensor.

This function generates a plot to visualize time series data. It displays a specified number of time series from the input tensor.

Parameters:

ts (tsgm.types.Tensor) – The time series data tensor of shape (num_samples, num_timesteps, num_features).
num (int, optional) – The number of time series to display. Defaults to 5.

Raises:

AssertionError – If the input tensor does not have three dimensions.

Example:

>>> visualize_ts(time_series_tensor, num=10)

visualize_ts_lineplot(ts: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], ys: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]] | None = None, num: int = 5, unite_features: bool = True, legend_fontsize: int = 12, tick_size: int = 10) → None[source]

Visualizes time series data using line plots.

This function generates line plots to visualize the time series data. It randomly selects a specified number of samples from the input tensor ts and plots each sample as a line plot. If ys is provided, it can be either a 1D or 2D tensor representing the target variable(s), and the function will optionally overlay it on the line plot.

Parameters:

ts (tsgm.types.Tensor) – Input time series data tensor.
ys (tsgm.types.OptTensor, optional) – Optional target variable(s) tensor, defaults to None.
num (int, optional) – Number of samples to visualize, defaults to 5.
unite_features (bool, optional) – Whether to plot all features together or separately, defaults to True.
legend_fontsize (int, optional) – Font size to use.
tick_size (int, optional) – Font size for y-axis ticks.

visualize_tsne(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], y: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], X_gen: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], y_gen: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], path: str = '/tmp/tsne_embeddings.pdf', feature_averaging: bool = False, perplexity=30.0) → None[source]

Visualizes t-SNE embeddings of real and synthetic data.

This function generates a scatter plot of t-SNE embeddings for real and synthetic data. Each data point is represented by a marker on the plot, and the colors of the markers correspond to the corresponding class labels of the data points.

Parameters:

X (tsgm.types.Tensor) – The original real data tensor of shape (num_samples, num_features).
y (tsgm.types.Tensor) – The labels of the original real data tensor of shape (num_samples,).
X_gen (tsgm.types.Tensor) – The generated synthetic data tensor of shape (num_samples, num_features).
y_gen (tsgm.types.Tensor) – The labels of the generated synthetic data tensor of shape (num_samples,).
path (str, optional) – The path to save the visualization as a PDF file. Defaults to “/tmp/tsne_embeddings.pdf”.
feature_averaging (bool, optional) – Whether to compute the average features for each class. Defaults to False.
perplexity (float, optional) – The perplexity parameter for t-SNE. Defaults to 30.0.

visualize_tsne_unlabeled(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], X_gen: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], palette: dict = {'gen': 'blue', 'hist': 'red'}, alpha: float = 0.25, path: str = '/tmp/tsne_embeddings.pdf', fontsize: int = 20, markerscale: int = 3, markersize: int = 1, feature_averaging: bool = False, perplexity: float = 30.0) → None[source]

Visualizes t-SNE embeddings of unlabeled data.

Parameters:

X (tsgm.types.Tensor) – The original data tensor of shape (num_samples, num_features).
X_gen (tsgm.types.Tensor) – The generated data tensor of shape (num_samples, num_features).
palette (dict, optional) – A dictionary mapping class labels to colors. Defaults to DEFAULT_PALETTE_TSNE.
alpha (float, optional) – The transparency level of the plotted points. Defaults to 0.25.
path (str, optional) – The path to save the visualization as a PDF file. Defaults to “/tmp/tsne_embeddings.pdf”.
fontsize (int, optional) – The font size of the class labels in the legend. Defaults to 20.
markerscale (int, optional) – The scaling factor for the size of the markers in the legend. Defaults to 3.
markersize (int, optional) – The size of the markers in the scatter plot. Defaults to 1.
feature_averaging (bool, optional) – Whether to compute the average features for each class. Defaults to False.
perplexity (float, optional) – The perplexity parameter for t-SNE. Defaults to 30.0.

Monitors 

class GANMonitor(*args: Any, **kwargs: Any)[source]

GANMonitor is a Keras callback for monitoring and visualizing generated samples during training.

Parameters:

num_samples (int) – The number of samples to generate and visualize.
latent_dim (int) – The dimensionality of the latent space. Defaults to 128.
labels (tsgm.types.Tensor) – The labels for conditional generation.
save (bool) – Whether to save the generated samples. Defaults to True.
save_path (str) – The path to save the generated samples. Defaults to None.
mode (str) – The generation mode, one of ‘clf’ or ‘reg’. Defaults to ‘clf’.

Raises:

ValueError – If the mode is not one of [‘clf’, ‘reg’]

Note:

If save is True and save_path is not specified, the default save path is “/tmp/”.

Warning:

If save_path is specified but save is False, a warning is issued.

on_epoch_end(epoch: int, logs: Dict | None = None) → None[source]

Callback function called at the end of each training epoch.

Parameters:

epoch (int) – Current epoch number.
logs (dict) – Dictionary containing the training loss values.

class VAEMonitor(*args: Any, **kwargs: Any)[source]

VAEMonitor is a Keras callback for monitoring and visualizing generated samples from a Variational Autoencoder (VAE) during training.

Parameters:

num_samples (int) – The number of samples to generate and visualize. Defaults to 6.
latent_dim (int) – The dimensionality of the latent space. Defaults to 128.
output_dim (int) – The dimensionality of the output space. Defaults to 2.
save (bool) – Whether to save the generated samples. Defaults to True.
save_path (str) – The path to save the generated samples. Defaults to None.

Raises:

ValueError – If output_dim is less than or equal to 0.

Note:

If save is True and save_path is not specified, the default save path is “/tmp/”.

Warning:

If save_path is specified but save is False, a warning is issued.

on_epoch_end(epoch: int, logs: Dict | None = None) → None[source]

Callback function called at the end of each training epoch.

Parameters:

epoch (int) – The current epoch number.
logs (dict) – Dictionary containing the training loss values.

Zoo 

class Architecture[source]

abstract property arch_type

class BaseClassificationArchitecture(seq_len: int, feat_dim: int, output_dim: int)[source]

Base class for classification architectures.

Parameters:

seq_len (int) – Length of input sequences.
feat_dim (int) – Dimensionality of input features.
output_dim (int) – Dimensionality of the output.

Initializes the base classification architecture.

Parameters:

seq_len (int) – Length of input sequences.
feat_dim (int) – Dimensionality of input features.
output_dim (int) – Dimensionality of the output.

arch_type = 'downstream:classification'

get() → Dict[source]

Returns a dictionary containing the model.

Returns:: A dictionary containing the model.
Return type:: dict

property model: keras.models.Model

Property to access the underlying Keras model.

Returns:: The Keras model.
Return type:: keras.models.Model

class BaseDenoisingArchitecture(seq_len: int, feat_dim: int, n_filters: int = 64, n_conv_layers: int = 3, **kwargs)[source]

Base class for denoising architectures in DDPM (Denoising Diffusion Probabilistic Models, tsgm.models.ddpm).

Initializes the BaseDenoisingArchitecture with the specified parameters.

Parameters:

seq_len (int) – The length of the input sequences.
feat_dim (int) – The dimensionality of the input features.
n_filters (int) – The number of filters for convolutional layers. Default is 64.
n_conv_layers (int) – The number of convolutional layers. Default is 3.

arch_type = 'ddpm:denoising'

get() → Dict[source]

Returns a dictionary containing the model.

Returns:: A dictionary containing the model.
Return type:: dict

property model: keras.models.Model

Provides access to the Keras model instance.

Returns:: The Keras model instance built by _build_model.
Return type:: keras.models.Model

class BaseGANArchitecture[source]

Base class for defining architectures of Generative Adversarial Networks (GANs).

property discriminator: keras.models.Model

Property for accessing the discriminator model.

Returns:: The discriminator model.
Return type:: keras.models.Model
Raises:: NotImplementedError – If the discriminator model is not found.

property generator: keras.models.Model

Property for accessing the generator model.

Returns:: The generator model.
Return type:: keras.models.Model
Raises:: NotImplementedError – If the generator model is not implemented.

get() → Dict[source]

Retrieves both discriminator and generator models as a dictionary.

Returns:: A dictionary containing discriminator and generator models.
Return type:: dict
Raises:: NotImplementedError – If either discriminator or generator models are not implemented.

class BaseVAEArchitecture[source]

Base class for defining architectures of Variational Autoencoders (VAEs).

property decoder: keras.models.Model

Property for accessing the decoder model.

Returns:: The decoder model.
Return type:: keras.models.Model
Raises:: NotImplementedError – If the decoder model is not implemented.

property encoder: keras.models.Model

Property for accessing the encoder model.

Returns:: The encoder model.
Return type:: keras.models.Model
Raises:: NotImplementedError – If the encoder model is not implemented.

get() → Dict[source]

Retrieves both encoder and decoder models as a dictionary.

Returns:: A dictionary containing encoder and decoder models.
Return type:: dict
Raises:: NotImplementedError – If either encoder or decoder models are not implemented.

class BasicRecurrentArchitecture(hidden_dim: int, output_dim: int, n_layers: int, network_type: str, name: str = 'Sequential')[source]

Base class for recurrent neural network architectures.

Inherits from Architecture.

Parameters:

hidden_dim – int, the number of units (e.g. 24)
output_dim – int, the number of output units (e.g. 1)
n_layers – int, the number of layers (e.g. 3)
network_type – str, one of ‘gru’, ‘lstm’, or ‘lstmLN’
name – str, model name Default: “Sequential”

arch_type = 'rnn_architecture'

build(activation: str = 'sigmoid', return_sequences: bool = True) → keras.models.Model[source]

Builds the recurrent neural network model.

Parameters:

activation (str) – Activation function for the output layer (default is ‘sigmoid’).
return_sequences (bool) – Whether to return the full sequence of outputs (default is True).

Returns:

The built Keras model.

Return type:

keras.models.Model

class BlockClfArchitecture(seq_len: int, feat_dim: int, output_dim: int, blocks: list)[source]

Architecture for classification using a sequence of blocks.

Inherits from BaseClassificationArchitecture.

Initializes the BlockClfArchitecture.

Parameters:

seq_len (int) – Length of input sequences.
feat_dim (int) – Dimensionality of input features.
output_dim (int) – Dimensionality of the output.
blocks (list) – List of blocks used in the architecture.

arch_type = 'downstream:classification'

class ConvnArchitecture(seq_len: int, feat_dim: int, output_dim: int, n_conv_blocks: int = 1)[source]

Convolutional neural network architecture for classification. Inherits from BaseClassificationArchitecture.

Initializes the convolutional neural network architecture.

Parameters:

seq_len (int) – Length of input sequences.
feat_dim (int) – Dimensionality of input features.
output_dim (int) – Dimensionality of the output.
n_conv_blocks (int, optional) – Number of convolutional blocks to use (default is 1).

class ConvnLSTMnArchitecture(seq_len: int, feat_dim: int, output_dim: int, n_conv_lstm_blocks: int = 1)[source]

Initializes the base classification architecture.

Parameters:

seq_len (int) – Length of input sequences.
feat_dim (int) – Dimensionality of input features.
output_dim (int) – Dimensionality of the output.

class DDPMConvDenoiser(**kwargs)[source]

A convolutional denoising model for DDPM.

This class defines a convolutional neural network architecture used as a denoiser in DDPM. It predicts the noise added to the input samples during the diffusion process.

Initializes the DDPMConvDenoiser model with additional parameters.

Parameters:: kwargs – Additional keyword arguments to be passed to the parent class.

arch_type = 'ddpm:denoiser'

class Sampling(*args: Any, **kwargs: Any)[source]

Custom Keras layer for sampling from a latent space.

This layer samples from a latent space using the reparameterization trick during training. It takes as input the mean and log variance of the latent distribution and generates samples by adding random noise scaled by the standard deviation to the mean.

call(inputs: Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]]) → jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]

Generates samples from a latent space.

Parameters:: inputs (tuple[tsgm.types.Tensor, tsgm.types.Tensor]) – Tuple containing mean and log variance tensors of the latent distribution.
Returns:: Sampled latent vector.
Return type:: tsgm.types.Tensor

class TimeEmbedding(*args: Any, **kwargs: Any)[source]

call(inputs: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]

class TransformerClfArchitecture(seq_len: int, feat_dim: int, num_heads: int = 2, ff_dim: int = 64, n_blocks: int = 1, dropout_rate=0.5, output_dim: int = 2)[source]

Base class for transformer architectures.

Inherits from BaseClassificationArchitecture.

Initializes the TransformerClfArchitecture.

Parameters:

seq_len (int) – Length of input sequences.
feat_dim (int) – Dimensionality of input features.
num_heads (int) – Number of attention heads (default is 2).
ff_dim (int) – Feed forward dimension in the attention block (default is 64).
n_blocks (int) – Number of transformer blocks (default is 1).
dropout_rate (float) – Dropout probability (default is 0.5).
output_dim (int) – Number of classes (default is 2).

arch_type = 'downstream:classification'

transformer_block(inputs)[source]

class VAE_CONV5Architecture(seq_len: int, feat_dim: int, latent_dim: int)[source]

This class defines the architecture for a Variational Autoencoder (VAE) with Convolutional Layers.

Parameters:

seq_len (int) – Length of input sequence.
feat_dim (int) – Dimensionality of input features.
latent_dim (int) – Dimensionality of latent space.

Initializes the VAE_CONV5Architecture.

Parameters:

seq_len (int) – Length of input sequences.
feat_dim (int) – Dimensionality of input features.
latent_dim (int) – Dimensionality of latent space.

arch_type = 'vae:unconditional'

class WaveGANArchitecture(seq_len: int, feat_dim: int = 64, latent_dim: int = 32, output_dim: int = 1, kernel_size: int = 32, phase_rad: int = 2, use_batchnorm: bool = False)[source]

WaveGAN architecture, from https://arxiv.org/abs/1802.04208

Inherits from BaseGANArchitecture.

Initializes the WaveGANArchitecture.

Parameters:

seq_len (int) – Length of input sequences.
feat_dim (int) – Dimensionality of input features.
latent_dim (int) – Dimensionality of the latent space.
output_dim (int) – Dimensionality of the output.
kernel_size (int, optional) – Sizes of convolutions
phase_rad (int, optional) – Phase shuffle radius for wavegan (default is 2)
use_batchnorm (bool, optional) – Whether to use batchnorm (default is False)

arch_type = 'gan:raw'

class Zoo(*arg, **kwargs)[source]

A collection of architectures represented. It behaves like supports Python dict API.

Initializes the Zoo.

summary() → None[source]: Prints a summary of architectures in the Zoo.

class cGAN_Conv4Architecture(seq_len: int, feat_dim: int, latent_dim: int, output_dim: int)[source]

Architecture for Conditional Generative Adversarial Network (cGAN) with Convolutional Layers.

Initializes the cGAN_Conv4Architecture.

Parameters:

seq_len (int) – Length of input sequence.
feat_dim (int) – Dimensionality of input features.
latent_dim (int) – Dimensionality of latent space.
output_dim (int) – Dimensionality of output.

arch_type = 'gan:conditional'

class cGAN_LSTMConv3Architecture(seq_len: int, feat_dim: int, latent_dim: int, output_dim: int)[source]

Architecture for Conditional Generative Adversarial Network (cGAN) with LSTM and Convolutional Layers.

Initializes the cGAN_LSTMConv3Architecture.

Parameters:

seq_len (int) – Length of input sequence.
feat_dim (int) – Dimensionality of input features.
latent_dim (int) – Dimensionality of latent space.
output_dim (int) – Dimensionality of output.

arch_type = 'gan:conditional'

class cGAN_LSTMnArchitecture(seq_len: int, feat_dim: int, latent_dim: int, output_dim: int, n_blocks: int = 1, output_activation: str = 'tanh')[source]

Conditional Generative Adversarial Network (cGAN) with LSTM-based architecture.

Inherits from BaseGANArchitecture.

Initializes the cGAN_LSTMnArchitecture.

Parameters:

seq_len (int) – Length of input sequences.
feat_dim (int) – Dimensionality of input features.
latent_dim (int) – Dimensionality of the latent space.
output_dim (int) – Dimensionality of the output.
n_blocks (int, optional) – Number of LSTM blocks in the architecture (default is 1).
output_activation (str, optional) – Activation function for the output layer (default is “tanh”).

arch_type = 'gan:conditional'

class cVAE_CONV5Architecture(seq_len: int, feat_dim: int, latent_dim: int, output_dim: int = 2)[source]

arch_type = 'vae:conditional'

class tcGAN_Conv4Architecture(seq_len: int, feat_dim: int, latent_dim: int, output_dim: int)[source]

Architecture for Temporal Conditional Generative Adversarial Network (tcGAN) with Convolutional Layers.

Initializes the tcGAN_Conv4Architecture.

Parameters:

seq_len (int) – Length of input sequence.
feat_dim (int) – Dimensionality of input features.
latent_dim (int) – Dimensionality of latent space.
output_dim (int) – Dimensionality of output.

arch_type = 'gan:t-conditional'

Simulators 

class BaseSimulator[source]

Abstract base class for simulators. This class defines the interface for simulators.

abstract dump(path: str, format: str = 'csv') → None[source]

Abstract method to save the generated dataset to a file.

Parameters:

path (str) – The file path where the dataset will be saved.
format (str) – The format in which to save the dataset, by default “csv”.

abstract generate(num_samples: int, *args) → Dataset[source]

Abstract method to generate a dataset.

Parameters:: num_samples (int) – Number of samples to generate.
Returns:: The generated dataset.
Return type:: tsgm.dataset.Dataset

class LotkaVolterraSimulator(data: DatasetProperties, alpha: float = 1, beta: float = 1, gamma: float = 1, delta: float = 1, x0: float = 1, y0: float = 1)[source]

Simulates the Lotka-Volterra equations, which model the dynamics of biological systems in which two species interact, one as a predator and the other as prey.

For the details refer to https://en.wikipedia.org/wiki/Lotka%E2%80%93Volterra_equations

Initializes the Lotka-Volterra simulator with given parameters.

Parameters:

data (tsgm.dataset.DatasetProperties) – The dataset properties.
alpha (float) – The maximum prey per capita growth rate. Default is 1.
beta (float) – The effect of the presence of predators on the prey death rate. Default is 1.
gamma (float) – The predator’s per capita death rate. Default is 1.
delta (float) – The effect of the presence of prey on the predator’s growth rate. Default is 1.
x0 (float) – The initial population density of prey. Default is 1.
y0 (float) – The initial population density of predator. Default is 1.

clone() → LotkaVolterraSimulator[source]

Creates a deep copy of the current LotkaVolterraSimulator instance.

Returns:: A new instance of LotkaVolterraSimulator with copied data and parameters.
Return type:: LotkaVolterraSimulator

generate(num_samples: int, tmax: float = 1)[source]

Generates the simulation data based on the Lotka-Volterra equations.

Parameters:

num_samples (int) – The number of sample points to generate.
tmax (float) – The maximum time value for the simulation. Default is 1.

Returns:

An array containing the population densities of prey and predators over time.

Return type:

np.ndarray

set_params(alpha, beta, gamma, delta, x0, y0, **kwargs)[source]

Sets the parameters for the simulator.

Parameters:

alpha (float) – The maximum prey per capita growth rate.
beta (float) – The effect of the presence of predators on the prey death rate.
gamma (float) – The predator’s per capita death rate.
delta (float) – The effect of the presence of prey on the predator’s growth rate.
x0 (float) – The initial population density of prey.
y0 (float) – The initial population density of predator.

class ModelBasedSimulator(data: DatasetProperties)[source]

A simulator that is based on a model. This class extends the Simulator class and provides additional methods for handling model parameters.

Parameters:: data (tsgm.dataset.DatasetProperties) – Properties of the dataset to be used.

abstract generate(num_samples: int, *args) → None[source]

Abstract method to generate a dataset. Must be implemented by subclasses.

Parameters:: num_samples (int) – Number of samples to generate.
Raises:: NotImplementedError – This method is not implemented in this class and must be overridden by subclasses.

params() → Dict[str, Any][source]

Get a dictionary of the simulator’s parameters.

Returns:: A dictionary containing the simulator’s parameters.
Return type:: dict

set_params(params: Dict[str, Any]) → None[source]

Set the simulator’s parameters from a dictionary.

Parameters:: params (dict) – A dictionary containing the parameters to set.

class NNSimulator(data: DatasetProperties, driver: Any | None = None)[source]

Parameters:

data (tsgm.dataset.DatasetProperties) – Properties of the dataset to be used.
driver (Optional[tsgm.types.Model]) – The model to be used for generating data, by default None.

clone() → NNSimulator[source]

Create a deep copy of the simulator.

Returns:: A deep copy of the current simulator instance.
Return type:: Simulator

class PredictiveMaintenanceSimulator(data: DatasetProperties)[source]

Predictive Maintenance Simulator class that extends the ModelBasedSimulator base class. The simulator is based on https://github.com/AaltoPML/human-in-the-loop-predictive-maintenance From publication: Nikitin, Alexander, and Samuel Kaski. “Human-in-the-loop large-scale predictive maintenance of workstations.” Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022.

Initializes the PredictiveMaintenanceSimulator with dataset properties and sets encoders for categorical features.

Parameters:: data (tsgm.dataset.DatasetProperties) – Dataset properties for the simulator.

CAT_FEATURES = [0, 1, 2, 3, 4, 5, 6, 7]

R(rho, lmbd, t)[source]

Calculates the recovery curve parameter.

Parameters:

rho (float) – Rho parameter for the recovery function.
lmbd (float) – Lambda parameter for the exponential distribution.
t (float) – Time variable.

Returns:

Recovery curve parameter at time t.

Return type:

float

S(lmbd, t)[source]

Calculates the survival curve.

Parameters:

lmbd (float) – Lambda parameter for the exponential distribution.
t (float) – Time variable.

Returns:

Survival probability at time t.

Return type:

float

clone() → PredictiveMaintenanceSimulator[source]

Creates a deep copy of the current PredictiveMaintenanceSimulator instance.

Returns:: A new instance of PredictiveMaintenanceSimulator with copied data and parameters.
Return type:: PredictiveMaintenanceSimulator

generate(num_samples: int)[source]

Samples equipment data and generates the dataset.

Parameters:: num_samples (int) – Number of samples to generate.
Returns:: A tuple containing the dataset and equipment information.
Return type:: tuple

mixture_function(a, x)[source]

Calculates the mixture function.

Parameters:

a (float) – Mixture parameter.
x (float) – Input variable.

Returns:

Mixture function value.

Return type:

float

sample_equipment(num_samples)[source]

Samples equipment data and generates the dataset.

Parameters:: num_samples (int) – Number of samples to generate.
Returns:: A tuple containing the dataset and equipment information.
Return type:: tuple

set_params(**kwargs)[source]

Sets the parameters for the simulator.

Parameters:: kwargs – Arbitrary keyword arguments for setting simulator parameters.

class Simulator(data: DatasetProperties, driver: Any | None = None)[source]

Concrete class for a basic simulator. This class implements the basic methods for fitting a model and generating a dataset, but does not implement the generation and dump methods.

Parameters:

data (tsgm.dataset.DatasetProperties) – Properties of the dataset to be used.
driver (Optional[tsgm.types.Model]) – The model to be used for generating data, by default None.

clone() → Simulator[source]

Create a deep copy of the simulator.

Returns:: A deep copy of the current simulator instance.
Return type:: Simulator

dump(path: str, format: str = 'csv') → None[source]

Method to save the generated dataset to a file. Not implemented in this class.

Parameters:

path (str) – The file path where the dataset will be saved.
format (str) – The format in which to save the dataset, by default “csv”.

Raises:

NotImplementedError – This method is not implemented in this class.

fit(**kwargs) → None[source]

Fit the model using the dataset properties.

Parameters:: kwargs – Additional keyword arguments to pass to the model’s fit method.

generate(num_samples: int, *args) → jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]

Method to generate a dataset. Not implemented in this class.

Parameters:: num_samples (int) – Number of samples to generate.
Returns:: The generated dataset.
Return type:: TensorLike
Raises:: NotImplementedError – This method is not implemented in this class.

class SineConstSimulator(data: DatasetProperties, max_scale: float = 10.0, max_const: float = 5.0)[source]

Sine and Constant Function Simulator class that extends the ModelBasedSimulator base class.

Parameters:

data (tsgm.dataset.DatasetProperties) – Dataset properties for the simulator.
max_scale (float) – Maximum value for the scale parameter. Defaults to 10.0.
max_const (float) – Maximum value for the constant parameter. Defaults to 5.0.

clone() → SineConstSimulator[source]

Creates a deep copy of the current SineConstSimulator instance.

Returns:: A new instance of SineConstSimulator with copied data and parameters.
Return type:: SineConstSimulator

generate(num_samples: int, *args) → Dataset[source]

Generates a dataset based on sine and constant functions.

Parameters:: num_samples (int) – Number of samples to generate.
Returns:: A dataset containing generated samples.
Return type:: tsgm.dataset.Dataset

set_params(max_scale: float, max_const: float, *args, **kwargs)[source]

Sets the parameters for scale, constant, and shift distributions.

Parameters:

max_scale (float) – Maximum value for the scale parameter.
max_const (float) – Maximum value for the constant parameter.

Data Processing Utils 

class TSFeatureWiseScaler(feature_range: Tuple[float, float] = (0, 1))[source]

Scales time series data feature-wise.

Parameters:: feature_range (tuple(float, float)) – Tuple representing the minimum and maximum feature values (default is (0, 1)).

Initializes a new instance of the TSFeatureWiseScaler class.

Parameters:: feature_range (tuple(float, float)) – Tuple representing the minimum and maximum feature values, defaults to (0, 1).

fit(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → TSFeatureWiseScaler[source]

Fits the scaler to the data.

Parameters:: X (TensorLike) – Input data.
Returns:: The fitted scaler object.
Return type:: TSFeatureWiseScaler

fit_transform(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]

Fits the scaler to the data and transforms it.

Parameters:: X (TensorLike) – Input data
Returns:: Scaled input data X
Return type:: TensorLike

inverse_transform(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]

Inverse-transforms the data.

Parameters:: X (TensorLike) – Input data.
Returns:: Original data.
Return type:: TensorLike

transform(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]

Transforms the data.

Parameters:: X (TensorLike) – Input data.
Returns:: Scaled X.
Return type:: TensorLike

class TSGlobalScaler[source]

Scales time series data globally.

fit(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → TSGlobalScaler[source]

Fits the scaler to the data.

Parameters:: X (TensorLike) – Input data.
Returns:: The fitted scaler object.
Return type:: TSGlobalScaler

fit_transform(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]

Fits the scaler to the data and transforms it.

Parameters:: X (TensorLike) – Input data
Returns:: Scaled input data X
Return type:: TensorLike

inverse_transform(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]

Inverse-transforms the data.

Parameters:: X (TensorLike) – Input data.
Returns:: Original data.
Return type:: TensorLike

transform(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]

Transforms the data.

Parameters:: X (TensorLike) – Input data.
Returns:: Scaled X.
Return type:: TensorLike