TSGM
Datasets
- class UCRDataManager(path: str | None = None, ds: str = 'gunpoint')[source]
A manager for UCR collection of time series datasets.
If you find these datasets useful, please cite:
Dau, Hoang Anh, Eamonn Keogh, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana, Yanping Chen, Bing Hu, Nurjahan Begum, Anthony Bagnall, Abdullah Mueen and Gustavo Batista (2018). “The UCR Time Series Classification Archive.” https://www.cs.ucr.edu/~eamonn/time_series_data_2018/
- Parameters:
path (str or None) – a relative path to the stored UCR dataset. If None, uses the default data directory.
ds (str) – Name of the dataset. The list of names is available at https://www.cs.ucr.edu/~eamonn/time_series_data_2018/ (case sensitive!).
- Raises:
ValueError – When there is no stored UCR archive, or the name of the dataset is incorrect.
- default_path = '/home/docs/checkouts/readthedocs.org/user_builds/tsgm/checkouts/latest/tsgm/utils/../../data'
- get() Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]
Returns a tuple containing training and testing data.
- Returns:
A tuple (X_train, y_train, X_test, y_test).
- Return type:
tuple[TensorLike, TensorLike, TensorLike, TensorLike]
- get_classes_distribution() Dict[source]
Returns a dictionary with the fraction of occurrences for each class.
- key = 'someone'
- mirrors = ['https://www.cs.ucr.edu/~eamonn/time_series_data_2018/']
- resources = [('UCRArchive_2018.zip', 0)]
- class UCRDataManager(path: str | None = None, ds: str = 'gunpoint')[source]
A manager for UCR collection of time series datasets.
If you find these datasets useful, please cite:
Dau, Hoang Anh, Eamonn Keogh, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana, Yanping Chen, Bing Hu, Nurjahan Begum, Anthony Bagnall, Abdullah Mueen and Gustavo Batista (2018). “The UCR Time Series Classification Archive.” https://www.cs.ucr.edu/~eamonn/time_series_data_2018/
- Parameters:
path (str or None) – a relative path to the stored UCR dataset. If None, uses the default data directory.
ds (str) – Name of the dataset. The list of names is available at https://www.cs.ucr.edu/~eamonn/time_series_data_2018/ (case sensitive!).
- Raises:
ValueError – When there is no stored UCR archive, or the name of the dataset is incorrect.
- default_path = '/home/docs/checkouts/readthedocs.org/user_builds/tsgm/checkouts/latest/tsgm/utils/../../data'
- get() Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]
Returns a tuple containing training and testing data.
- Returns:
A tuple (X_train, y_train, X_test, y_test).
- Return type:
tuple[TensorLike, TensorLike, TensorLike, TensorLike]
- get_classes_distribution() Dict[source]
Returns a dictionary with the fraction of occurrences for each class.
- key = 'someone'
- mirrors = ['https://www.cs.ucr.edu/~eamonn/time_series_data_2018/']
- resources = [('UCRArchive_2018.zip', 0)]
- y_all: Collection[Hashable] | None
- download_physionet2012() None[source]
Downloads the Physionet 2012 dataset files from the Physionet website and extracts them in local folder ‘physionet2012’
- gen_sine_const_switch_dataset(N: int, T: int, D: int, max_value: int = 10, const: int = 0, frequency_switch: float = 0.1) Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]
Generates a dataset with alternating constant and sinusoidal sequences.
- Parameters:
N (int) – Number of samples in the dataset.
T (int) – Length of each sequence in the dataset.
D (int) – Number of dimensions in each sequence.
max_value (int, optional) – Maximum value for amplitude and shift of the sinusoids. Defaults to 10.
const (int, optional) – Value indicating whether the sequence is constant or sinusoidal. Defaults to 0.
frequency_switch (float, optional) – Probability of switching between constant and sinusoidal sequences. Defaults to 0.1.
- Returns:
Tuple containing input data (X) and target labels (y).
- Return type:
- gen_sine_dataset(N: int, T: int, D: int, max_value: int = 10) ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]
Generates a dataset of sinusoidal waves with random parameters.
- Parameters:
- Returns:
Generated dataset with shape (N, T, D).
- Return type:
- gen_sine_vs_const_dataset(N: int, T: int, D: int, max_value: int = 10, const: int = 0) Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]
Generates a dataset with alternating sinusoidal and constant sequences.
- Parameters:
N (int) – Number of samples in the dataset.
T (int) – Length of each sequence in the dataset.
D (int) – Number of dimensions in each sequence.
max_value (int, optional) – Maximum value for amplitude and shift of the sinusoids. Defaults to 10.
const (int, optional) – Maximum value for the constant sequence. Defaults to 0.
- Returns:
Tuple containing input data (X) and target labels (y).
- Return type:
- get_covid_19() Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], Tuple, List][source]
Loads Covid-19 dataset with additional graph information.
The dataset is based on data from The New York Times, based on reports from state and local health agencies:
The New York Times (2021). “Coronavirus (Covid-19) Data in the United States.” https://github.com/nytimes/covid-19-data
Adapted to the graph case in:
Alexander V. Nikitin, ST John, Arno Solin, Samuel Kaski. “Non-separable spatio-temporal graph kernels via SPDEs.” Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:10640-10660, 2022.
- Returns:
A tuple
(data, graph, states)wheredatahas shape (n_nodes, n_timestamps, n_features) with features being deaths, cases, deaths normalized by population, and cases normalized by population;graphis a tuple (nodes, edges); andstatesis the list of state names.- Return type:
- get_eeg() Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]
Loads the EEG Eye State dataset.
This function downloads the EEG Eye State dataset from the UCI Machine Learning Repository and returns the input features (X) and target labels (y).
- Returns:
A tuple containing the input features (X) and target labels (y).
- Return type:
tuple[TensorLike, TensorLike]
- get_energy_data() ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]
Retrieves the energy consumption dataset.
This function downloads and loads the energy consumption dataset from the UCI Machine Learning Repository. It returns the dataset as a NumPy array.
- Returns:
Energy consumption dataset.
- Return type:
- get_gp_samples_data(num_samples: int, max_time: int, covar_func: Callable | None = None) ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]
Generates samples from a Gaussian process.
This function generates samples from a Gaussian process using the specified covariance function. It returns the generated samples as a NumPy array.
- Parameters:
- Returns:
Generated samples from the Gaussian process.
- Return type:
- get_mauna_loa() Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]
Loads the Mauna Loa CO2 dataset.
This function loads the Mauna Loa CO2 dataset, which contains measurements of atmospheric CO2 concentrations at the Mauna Loa Observatory in Hawaii.
- Returns:
A tuple containing the input data (X) and target labels (y).
- Return type:
tuple[TensorLike, TensorLike]
- get_mnist_data() Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]
Retrieves the MNIST dataset.
This function loads the MNIST dataset, which consists of 28x28 grayscale images of handwritten digits, and returns the training and testing data along with their corresponding labels.
- Returns:
A tuple containing the training data, training labels, testing data, and testing labels.
- Return type:
tuple[TensorLike, TensorLike, TensorLike, TensorLike]
- get_physionet2012() Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]
Retrieves the Physionet 2012 dataset.
This function downloads and retrieves the Physionet 2012 dataset, which consists of physiological data and corresponding outcomes. It returns the training, testing, and validation datasets along with their labels.
- Returns:
A tuple containing the training, testing, and validation datasets along with their labels. (train_X, train_y, test_X, test_y, val_X, val_y)
- Return type:
tuple[TensorLike, TensorLike, TensorLike, TensorLike, TensorLike, TensorLike]
- get_power_consumption() ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]
Retrieves the household power consumption dataset.
This function downloads and loads the household power consumption dataset from the UCI Machine Learning Repository. It returns the dataset as a NumPy array.
- Returns:
Household power consumption dataset.
- Return type:
- get_stock_data(stock_name: str) ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]
Downloads historical stock data for the specified stock ticker.
This function downloads historical stock data for the specified stock ticker using the Yahoo Finance API. It returns the stock data as a NumPy array with an additional axis representing the batch dimension.
- Parameters:
stock_name (str) – Ticker symbol of the stock.
- Returns:
Historical stock data.
- Return type:
- Raises:
ValueError – If the provided stock ticker is invalid or no data is available.
- get_synchronized_brainwave_dataset() Tuple[DataFrame, DataFrame][source]
Loads the EEG Synchronized Brainwave dataset.
This function downloads the EEG Synchronized Brainwave dataset from dropbox and returns the input features (X) and target labels (y).
- Returns:
A tuple containing the input features (X) and target labels (y).
- Return type:
tuple[pd.DataFrame, pd.DataFrame]
- load_arff(path: str) DataFrame[source]
Loads data from an ARFF (Attribute-Relation File Format) file.
This function reads data from an ARFF file located at the specified path and returns it as a pandas DataFrame.
- Parameters:
path (str) – Path to the ARFF file.
- Returns:
DataFrame containing the loaded data.
- Return type:
pandas.DataFrame
- split_dataset_into_objects(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], y: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], step: int = 10) Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]
Splits the dataset into objects of fixed length.
This function splits the input dataset into objects of fixed length along the first dimension, 0-padding if necessary.
Augmentations
- class BaseAugmenter(per_feature: bool)[source]
- generate(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], y: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]] | None = None, n_samples: int = 1) jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]] | Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]
- class BaseCompose(augmentations: List[BaseAugmenter])[source]
- class DTWBarycentricAveraging[source]
- DTW Barycenter Averaging (DBA) [1] method estimated through
Expectation-Maximization algorithm [2] as in https://github.com/tslearn-team/tslearn/
References
- generate(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], y: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]] | None = None, n_samples: int = 1, num_initial_samples: int | None = None, initial_timeseries: List[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]] | None = None, initial_labels: List[int] | None = None, **kwargs) jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]] | Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]
- Parameters:
X (TensorLike) – The timeseries dataset.
y (TensorLike or None) – The classes, or None.
n_samples (int) – Number of samples to generate (per class, if y is given).
num_initial_samples (int or None) – The number of timeseries to draw (per class) from the dataset before computing DTW_BA. If None, use the entire set (per class).
initial_timeseries (array or None) – Initial timeseries to start from for the optimization process, with shape (original_size, d). In case y is given, the shape of initial_timeseries is assumed to be (n_classes, original_size, d).
initial_labels (array or None) – Labels for samples from
initial_timeseries.
- Returns:
np.array of shape (n_samples, original_size, d) if y is None or (n_classes * n_samples, original_size, d), and np.array of labels (or None).
- Return type:
- class GaussianNoise(per_feature: bool = True)[source]
Apply noise to the input time series.
- Parameters:
variance (float or tuple(float, float)) – Variance range for noise. If var_limit is a single float, the range will be (0, var_limit). Default: (10.0, 50.0).
mean (float) – Mean of the noise. Default: 0.
per_feature (bool) – If set to True, noise will be sampled for each feature independently. Otherwise, the noise will be sampled once for all features. Default: True.
- generate(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], y: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]] | None = None, n_samples: int = 1, mean: float = 0, variance: float = 1.0) jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]] | Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]
Generate synthetic data with Gaussian noise.
- Parameters:
X (TensorLike) – Input data tensor of shape (n_data, n_timesteps, n_features).
y (Optional[TensorLike]) – Optional labels tensor. If provided, labels will also be returned
n_samples (int) – Number of augmented samples to generate. Default is 1.
mean (float) – The mean of the noise. Default is 0.
variance (float) – The variance of the noise. Default is 1.0.
- Returns:
Augmented data tensor of shape (n_samples, n_timesteps, n_features) and optionally augmented labels if ‘y’ is provided.
- Return type:
Union[TensorLike, Tuple[TensorLike, TensorLike]]
- class MagnitudeWarping[source]
Magnitude warping changes the magnitude of each sample by convolving the data window with a smooth curve varying around one https://dl.acm.org/doi/pdf/10.1145/3136755.3136817
- generate(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], y: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]] | None = None, n_samples: int = 1, sigma: float = 0.2, n_knots: int = 4) jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]] | Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]
Generates augmented samples via MagnitudeWarping for (X, y)
- Parameters:
X (TensorLike) – Input data tensor of shape (n_data, n_timesteps, n_features).
y (Optional[TensorLike]) – Optional labels tensor. If provided, labels will also be returned
n_samples (int) – Number of augmented samples to generate. Default is 1.
sigma (float) – Standard deviation for the random warping. Default is 0.2.
n_knots (int) – Number of knots used for warping curve. Default is 4.
- Returns:
Augmented data tensor of shape (n_samples, n_timesteps, n_features) and optionally augmented labels if ‘y’ is provided.
- Return type:
Union[TensorLike, Tuple[TensorLike, TensorLike]]
- class Shuffle[source]
Shuffles time series features. Shuffling is beneficial when each feature corresponds to interchangeable sensors.
- generate(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], y: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]] | None = None, n_samples: int = 1) jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]] | Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]
Generate synthetic data using Shuffle strategy. Features are randomly shuffled to generate novel samples.
- Parameters:
X (TensorLike) – Input data tensor of shape (n_data, n_timesteps, n_features).
y (Optional[TensorLike]) – Optional labels tensor. If provided, labels will also be returned
n_samples (int) – Number of augmented samples to generate. Default is 1.
- Returns:
Augmented data tensor of shape (n_samples, n_timesteps, n_features) and optionally augmented labels if ‘y’ is provided.
- Return type:
Union[TensorLike, Tuple[TensorLike, TensorLike]]
- class SliceAndShuffle(per_feature: bool = False)[source]
Slice the time series in k pieces and create a new time series by shuffling.
- Parameters:
per_feature (bool) – If set to True, each time series is sliced independently. Otherwise, all features are sliced in the same way. Default: True.
- generate(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], y: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]] | None = None, n_samples: int = 1, n_segments: int = 2) jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]] | Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]
Generate synthetic data using Slice-And-Shuffle strategy. Slices are randomly selected.
- Parameters:
- Returns:
Augmented data tensor of shape (n_samples, n_timesteps, n_features) and optionally augmented labels if ‘y’ is provided.
- Return type:
Union[TensorLike, Tuple[TensorLike, TensorLike]]
- class WindowWarping[source]
https://halshs.archives-ouvertes.fr/halshs-01357973/document
- generate(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], y: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]] | None = None, window_ratio: float = 0.2, scales: Tuple = (0.25, 1.0), n_samples: int = 1) jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]] | Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]
Generates augmented samples via WindowWarping for (X, y)
- Parameters:
X (TensorLike) – Input data tensor of shape (n_data, n_timesteps, n_features).
y (Optional[TensorLike]) – Optional labels tensor. If provided, labels will also be returned
window_ratio (float) – The ratio of the window size relative to the total number of timesteps. Default is 0.2.
scales (tuple) – A tuple specifying the scale range for warping. Default is (0.25, 1.0).
n_samples (int) – Number of augmented samples to generate. Default is 1.
- Returns:
Augmented data tensor of shape (n_samples, n_timesteps, n_features) and optionally augmented labels if ‘y’ is provided.
- Return type:
Union[TensorLike, Tuple[TensorLike, TensorLike]]
Metrics
- class ConsistencyMetric(evaluators: List)[source]
Predictive consistency metric measures whether a set of evaluators yield consistent results on real and synthetic data.
- Parameters:
evaluators (list) – A list of evaluators (each item should implement method
.evaluate(D))
- class DemographicParityMetric[source]
Measuring demographic parity between two datasets.
This metric assesses the difference in the distributions of a target variable among different groups in two datasets. By default, it uses the Kolmogorov-Smirnov statistic to quantify the maximum vertical deviation between the cumulative distribution functions of the target variable for the historical and synthetic data within each group.
- Example:
>>> metric = DemographicParityMetric() >>> dataset_hist = tsgm.dataset.Dataset(...) >>> dataset_synth = tsgm.dataset.Dataset(...) >>> groups_hist = [0, 1, 0, 1, 1, 0] >>> groups_synth = [1, 1, 0, 0, 0, 1] >>> result = metric(dataset_hist, groups_hist, dataset_synth, groups_synth) >>> print(result)
- class DiscriminativeMetric[source]
The DiscriminativeMetric measures the discriminative performance of a model in distinguishing between synthetic and real datasets.
This metric evaluates a discriminative model by training it on a combination of synthetic and real datasets and assessing its performance on a test set.
- Parameters:
d_hist (tsgm.dataset.DatasetOrTensor) – Real dataset.
d_syn (tsgm.dataset.DatasetOrTensor) – Synthetic dataset.
model (T.Callable) – Discriminative model to be evaluated.
test_size (T.Union[float, int]) – Proportion of the dataset to include in the test split or the absolute number of test samples.
n_epochs (int) – Number of training epochs for the model.
metric (T.Optional[T.Callable]) – Optional evaluation metric to use (default: accuracy).
random_seed (T.Optional[int]) – Optional random seed for reproducibility.
- Returns:
Discriminative performance metric.
- Return type:
Example:
>>> from my_module import DiscriminativeMetric, MyDiscriminativeModel >>> import tsgm.dataset >>> import numpy as np >>> import sklearn >>> >>> # Create real and synthetic datasets >>> real_dataset = tsgm.dataset.Dataset(...) # Replace ... with appropriate arguments >>> synthetic_dataset = tsgm.dataset.Dataset(...) # Replace ... with appropriate arguments >>> >>> # Create a discriminative model >>> model = MyDiscriminativeModel() # Replace with the actual discriminative model class >>> >>> # Create and use the DiscriminativeMetric >>> metric = DiscriminativeMetric() >>> result = metric(real_dataset, synthetic_dataset, model, test_size=0.2, n_epochs=10) >>> print(result)
- class DistanceMetric(statistics: list, discrepancy: Callable)[source]
Metric that measures similarity between synthetic and real time series
- Parameters:
- discrepancy(stats1: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], stats2: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) float[source]
- Parameters:
stats1 (tsgm.types.Tensor.) – A vector of summary statistics.
stats2 (tsgm.types.Tensor.) – A vector of summary statistics.
- Returns:
the distance between two vectors calculated by self._discrepancy.
- class DownstreamPerformanceMetric(evaluator: BaseDownstreamEvaluator)[source]
The downstream performance metric evaluates the performance of a model on a downstream task. It returns performance gains achieved with the addition of synthetic data.
- Parameters:
evaluator (BaseDownstreamEvaluator) – An evaluator, should implement method
.evaluate(D)
- class EntropyMetric[source]
Calculates the spectral entropy of a dataset or tensor as a sum of individual entropies.
- Example:
>>> metric = EntropyMetric() >>> dataset = tsgm.dataset.Dataset(...) >>> result = metric(dataset) >>> print(result)
- class MMDMetric(kernel: ~typing.Callable = <function exp_quad_kernel>)[source]
This metric calculates MMD between real and synthetic samples.
- Example:
>>> metric = MMDMetric(kernel) >>> dataset, synth_dataset = tsgm.dataset.Dataset(...), tsgm.dataset.Dataset(...) >>> result = metric(dataset) >>> print(result)
- class PairwiseDistanceMetric[source]
Measures pairwise distances in a set of time series.
- pairwise_euclidean_distances(ts: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]
Computes the pairwise Euclidean distances for a set of time series.
- Parameters:
ts (numpy.ndarray) – A 2D array where each row represents a time series.
- Returns:
A 2D array representing the pairwise Euclidean distance matrix.
- Return type:
- class PredictiveParityMetric[source]
Measuring predictive parity between two datasets.
This metric assesses the discrepancy in the predictive performance of a model among different groups in two datasets. By default, it uses precision to quantify the predictive performance of the model within each group.
- Example:
>>> metric = PredictiveParityMetric() >>> y_true_hist = [0, 1, 0, 1, 1, 0] >>> y_pred_hist = [0, 1, 0, 0, 1, 1] >>> groups_hist = [0, 1, 0, 1, 1, 0] >>> y_true_synth = [1, 0, 1, 0, 0, 1] >>> y_pred_synth = [1, 0, 1, 1, 0, 0] >>> groups_synth = [1, 1, 0, 0, 0, 1] >>> result = metric(y_true_hist, y_pred_hist, groups_hist, y_true_synth, y_pred_synth, groups_synth) >>> print(result)
GANs
- class ConditionalGAN(*args: Any, **kwargs: Any)[source]
Conditional GAN implementation for labeled and temporally labeled time series.
- Parameters:
discriminator (keras.Model) – A discriminator model which takes a time series as input and check whether the sample is real or fake.
generator (keras.Model) – Takes as input a random noise vector of
latent_dimlength and return a simulated time-series.latent_dim (int) – The size of the noise vector.
temporal (bool) – Indicates whether the time series temporally labeled or not.
use_wgan (bool) – Use Wasserstein GAN with gradient penalty. Default is False.
- call(inputs)[source]
Forward pass for the ConditionalGAN model. This method is required for Keras 3 compatibility with PyTorch backend.
- compile(d_optimizer: keras.optimizers.Optimizer, g_optimizer: keras.optimizers.Optimizer, loss_fn: Callable) None[source]
Compiles the generator and discriminator models.
- Parameters:
d_optimizer (keras.optimizers.Optimizer) – An optimizer for the GAN’s discriminator.
g_optimizer (keras.optimizers.Optimizer) – An optimizer for the GAN’s generator.
loss_fn (keras.losses.Loss) – Loss function.
- generate(labels: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]
Generates new data from the model.
- Parameters:
labels (tsgm.types.Tensor) – The labels for which to generate samples.
- Returns:
Generated samples.
- Return type:
tsgm.types.Tensor
- property metrics: List
- Returns:
A list of metrics trackers (e.g., generator’s loss and discriminator’s loss).
- Return type:
T.List
- class GAN(*args: Any, **kwargs: Any)[source]
GAN implementation for unlabeled time series.
- Parameters:
discriminator (keras.Model) – A discriminator model which takes a time series as input and check whether the sample is real or fake.
generator (keras.Model) – Takes as input a random noise vector of
latent_dimlength and returns a simulated time-series.latent_dim (int) – The size of the noise vector.
use_wgan (bool) – Use Wasserstein GAN with gradient penalty.
- call(inputs)[source]
Forward pass for the GAN model. This method is required for Keras 3 compatibility with PyTorch backend.
- compile(d_optimizer: keras.optimizers.Optimizer, g_optimizer: keras.optimizers.Optimizer, loss_fn: keras.losses.Loss) None[source]
Compiles the generator and discriminator models.
- Parameters:
d_optimizer (keras.optimizers.Optimizer) – An optimizer for the GAN’s discriminator.
g_optimizer (keras.optimizers.Optimizer) – An optimizer for the GAN’s generator.
loss_fn (keras.losses.Loss) – Loss function.
- generate(num: int) jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]
Generates new data from the model.
- Parameters:
num (int) – the number of samples to be generated.
- Returns:
Generated samples
- Return type:
tsgm.types.Tensor
- property metrics: List
- Returns:
A list of metrics trackers (e.g., generator’s loss and discriminator’s loss).
- train_step(data: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) Dict[str, float][source]
Performs a training step using a batch of data, stored in data.
- train_step_tf(tf, data: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) Dict[str, float][source]
VAEs
- class BetaVAE(*args: Any, **kwargs: Any)[source]
beta-VAE implementation for unlabeled time series.
- Parameters:
encoder (keras.Model) – An encoder model which takes a time series as input.
decoder (keras.Model) – Takes as input a random noise vector and returns a simulated time-series.
beta (float) – The weight of the KL divergence term. Default is 1.0.
- call(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]
Encodes and decodes time series dataset X.
- Parameters:
X (tsgm.types.Tensor) – The input time series tensor.
- Returns:
Generated samples
- Return type:
tsgm.types.Tensor
- generate(n: int) jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]
Generates new data from the model.
- Parameters:
n (int) – the number of samples to be generated.
- Returns:
A tensor with generated samples.
- Return type:
tsgm.types.Tensor
- property metrics: List
- Returns:
A list of metrics trackers (total loss, reconstruction loss, and KL loss).
- train_step(data: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) Dict[source]
Performs a training step using a batch of data, stored in data.
- Parameters:
data (tsgm.types.Tensor) – A batch of data in a format batch_size x seq_len x feat_dim
- Returns:
A dict with losses
- Return type:
T.Dict
- train_step_jax(jax, data: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) Dict[source]
- class cBetaVAE(*args: Any, **kwargs: Any)[source]
- call(data: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]
Encodes and decodes time series dataset.
- Parameters:
data (tsgm.types.Tensor) – The input data, either a tensor or a tuple of (X, labels).
- Returns:
Generated samples.
- Return type:
tsgm.types.Tensor
- generate(labels: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]
Generates new data from the model.
- Parameters:
labels (tsgm.types.Tensor) – The labels for which to generate conditional samples.
- Returns:
A tuple of synthetically generated data and labels.
- Return type:
T.Tuple[tsgm.types.Tensor, tsgm.types.Tensor]
- train_step(data: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) Dict[str, float][source]
Performs a training step using a batch of data, stored in data.
- train_step_jax(jax, data: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) Dict[str, float][source]
ABC
- class RejectionSampler(simulator: ModelBasedSimulator, data: Dataset, statistics: List, epsilon: float, discrepancy: Callable, priors: Dict | None = None, **kwargs)[source]
Rejection sampling algorithm for approximate Bayesian computation.
- Parameters:
simulator (class
tsgm.simulator.ModelBasedSimulator) – A model based simulatordata (class
tsgm.dataset.Dataset) – Historical dataset storagestatistics (list) – contains a list of summary statistics
epsilon (float) – tolerance of synthetically generated data to a set of summary statistics
discrepancy (Callable) – discrepancy measure function
priors (dict) – set of priors for each of the simulator parameters, defaults to DEFAULT_PRIOR
- prior_samples(priors: Dict, params: List) Dict[source]
Generate prior samples for the specified parameters.
- Parameters:
priors (T.Dict) – A dictionary containing probability distributions for each parameter. Keys are parameter names, and values are instances of probability distribution classes. If a parameter is not present in the dictionary, a default prior distribution is used.
params (T.List) – A list of parameter names for which prior samples are to be generated.
- Returns:
A dictionary where keys are parameter names and values are samples drawn from their respective prior distributions.
- Return type:
T.Dict
Example:
priors = {'mean': NormalDistribution(0, 1), 'std_dev': UniformDistribution(0, 2)} params = ['mean', 'std_dev'] samples = prior_samples(priors, params)
STS
- STS
alias of
STSTensorFlow
- class STSTensorFlow(model: tensorflow_probability.sts.StructuralTimeSeries | None = None)[source]
Class for training and generating from a structural time series model.
Initializes a new instance of the STS class.
- Parameters:
model (tfp.sts.StructuralTimeSeriesModel or None) – Structural time series model to use. If None, default model is used.
- elbo_loss() float[source]
Returns the evidence lower bound (ELBO) loss from training.
- Returns:
The value of the ELBO loss.
- Return type:
- generate(num_samples: int) jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]
Generates samples from the trained model.
- Parameters:
num_samples (int) – Number of samples to generate.
- Returns:
Generated samples.
- Return type:
tsgm.types.Tensor
Visualization
- visualize_dataset(dataset: Dataset | jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], obj_id: int = 0, palette: dict = {'gen': 'blue', 'hist': 'red'}, path: str = '/tmp/generated_data.pdf') None[source]
The function visualizes time series dataset with target values.
- Parameters:
dataset (tsgm.dataset.DatasetOrTensor.) – A time series dataset.
- visualize_original_and_reconst_ts(original: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], reconst: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], num: int = 5, vmin: int = 0, vmax: int = 1) None[source]
Visualizes original and reconstructed time series data.
This function generates side-by-side visualizations of the original and reconstructed time series data. It randomly selects a specified number of samples from the input tensors
originalandreconstand displays them as images using imshow.- Parameters:
original (tsgm.types.Tensor) – Original time series data tensor.
reconst (tsgm.types.Tensor) – Reconstructed time series data tensor.
num (int, optional) – Number of samples to visualize, defaults to 5.
vmin (int, optional) – Minimum value for colormap normalization, defaults to 0.
vmax (int, optional) – Maximum value for colormap normalization, defaults to 1.
- visualize_training_loss(loss_vector: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], labels: tuple = (), path: str = '/tmp/training_loss.pdf') None[source]
Plot training losses as a function of the epochs
- visualize_ts(ts: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], num: int = 5) None[source]
Visualizes time series tensor.
This function generates a plot to visualize time series data. It displays a specified number of time series from the input tensor.
- Parameters:
ts (tsgm.types.Tensor) – The time series data tensor of shape (num_samples, num_timesteps, num_features).
num (int, optional) – The number of time series to display. Defaults to 5.
- Raises:
AssertionError – If the input tensor does not have three dimensions.
- Example:
>>> visualize_ts(time_series_tensor, num=10)
- visualize_ts_lineplot(ts: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], ys: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]] | None = None, num: int = 5, unite_features: bool = True, legend_fontsize: int = 12, tick_size: int = 10) None[source]
Visualizes time series data using line plots.
This function generates line plots to visualize the time series data. It randomly selects a specified number of samples from the input tensor
tsand plots each sample as a line plot. Ifysis provided, it can be either a 1D or 2D tensor representing the target variable(s), and the function will optionally overlay it on the line plot.- Parameters:
ts (tsgm.types.Tensor) – Input time series data tensor.
ys (tsgm.types.OptTensor, optional) – Optional target variable(s) tensor, defaults to None.
num (int, optional) – Number of samples to visualize, defaults to 5.
unite_features (bool, optional) – Whether to plot all features together or separately, defaults to True.
legend_fontsize (int, optional) – Font size to use.
tick_size (int, optional) – Font size for y-axis ticks.
- visualize_tsne(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], y: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], X_gen: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], y_gen: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], path: str = '/tmp/tsne_embeddings.pdf', feature_averaging: bool = False, perplexity=30.0) None[source]
Visualizes t-SNE embeddings of real and synthetic data.
This function generates a scatter plot of t-SNE embeddings for real and synthetic data. Each data point is represented by a marker on the plot, and the colors of the markers correspond to the corresponding class labels of the data points.
- Parameters:
X (tsgm.types.Tensor) – The original real data tensor of shape (num_samples, num_features).
y (tsgm.types.Tensor) – The labels of the original real data tensor of shape (num_samples,).
X_gen (tsgm.types.Tensor) – The generated synthetic data tensor of shape (num_samples, num_features).
y_gen (tsgm.types.Tensor) – The labels of the generated synthetic data tensor of shape (num_samples,).
path (str, optional) – The path to save the visualization as a PDF file. Defaults to “/tmp/tsne_embeddings.pdf”.
feature_averaging (bool, optional) – Whether to compute the average features for each class. Defaults to False.
perplexity (float, optional) – The perplexity parameter for t-SNE. Defaults to 30.0.
- visualize_tsne_unlabeled(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], X_gen: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], palette: dict = {'gen': 'blue', 'hist': 'red'}, alpha: float = 0.25, path: str = '/tmp/tsne_embeddings.pdf', fontsize: int = 20, markerscale: int = 3, markersize: int = 1, feature_averaging: bool = False, perplexity: float = 30.0) None[source]
Visualizes t-SNE embeddings of unlabeled data.
- Parameters:
X (tsgm.types.Tensor) – The original data tensor of shape (num_samples, num_features).
X_gen (tsgm.types.Tensor) – The generated data tensor of shape (num_samples, num_features).
palette (dict, optional) – A dictionary mapping class labels to colors. Defaults to DEFAULT_PALETTE_TSNE.
alpha (float, optional) – The transparency level of the plotted points. Defaults to 0.25.
path (str, optional) – The path to save the visualization as a PDF file. Defaults to “/tmp/tsne_embeddings.pdf”.
fontsize (int, optional) – The font size of the class labels in the legend. Defaults to 20.
markerscale (int, optional) – The scaling factor for the size of the markers in the legend. Defaults to 3.
markersize (int, optional) – The size of the markers in the scatter plot. Defaults to 1.
feature_averaging (bool, optional) – Whether to compute the average features for each class. Defaults to False.
perplexity (float, optional) – The perplexity parameter for t-SNE. Defaults to 30.0.
Monitors
- class GANMonitor(*args: Any, **kwargs: Any)[source]
GANMonitor is a Keras callback for monitoring and visualizing generated samples during training.
- Parameters:
num_samples (int) – The number of samples to generate and visualize.
latent_dim (int) – The dimensionality of the latent space. Defaults to 128.
labels (tsgm.types.Tensor) – The labels for conditional generation.
save (bool) – Whether to save the generated samples. Defaults to True.
save_path (str) – The path to save the generated samples. Defaults to None.
mode (str) – The generation mode, one of ‘clf’ or ‘reg’. Defaults to ‘clf’.
- Raises:
ValueError – If the mode is not one of [‘clf’, ‘reg’]
- Note:
If
saveis True andsave_pathis not specified, the default save path is “/tmp/”.- Warning:
If
save_pathis specified butsaveis False, a warning is issued.
- class VAEMonitor(*args: Any, **kwargs: Any)[source]
VAEMonitor is a Keras callback for monitoring and visualizing generated samples from a Variational Autoencoder (VAE) during training.
- Parameters:
num_samples (int) – The number of samples to generate and visualize. Defaults to 6.
latent_dim (int) – The dimensionality of the latent space. Defaults to 128.
output_dim (int) – The dimensionality of the output space. Defaults to 2.
save (bool) – Whether to save the generated samples. Defaults to True.
save_path (str) – The path to save the generated samples. Defaults to None.
- Raises:
ValueError – If
output_dimis less than or equal to 0.- Note:
If
saveis True andsave_pathis not specified, the default save path is “/tmp/”.- Warning:
If
save_pathis specified butsaveis False, a warning is issued.
Zoo
- class BaseClassificationArchitecture(seq_len: int, feat_dim: int, output_dim: int)[source]
Base class for classification architectures.
- Parameters:
Initializes the base classification architecture.
- Parameters:
- arch_type = 'downstream:classification'
- get() Dict[source]
Returns a dictionary containing the model.
- Returns:
A dictionary containing the model.
- Return type:
- property model: keras.models.Model
Property to access the underlying Keras model.
- Returns:
The Keras model.
- Return type:
keras.models.Model
- class BaseDenoisingArchitecture(seq_len: int, feat_dim: int, n_filters: int = 64, n_conv_layers: int = 3, **kwargs)[source]
Base class for denoising architectures in DDPM (Denoising Diffusion Probabilistic Models,
tsgm.models.ddpm).Initializes the BaseDenoisingArchitecture with the specified parameters.
- Parameters:
- arch_type = 'ddpm:denoising'
- get() Dict[source]
Returns a dictionary containing the model.
- Returns:
A dictionary containing the model.
- Return type:
- property model: keras.models.Model
Provides access to the Keras model instance.
- Returns:
The Keras model instance built by
_build_model.- Return type:
keras.models.Model
- class BaseGANArchitecture[source]
Base class for defining architectures of Generative Adversarial Networks (GANs).
- property discriminator: keras.models.Model
Property for accessing the discriminator model.
- Returns:
The discriminator model.
- Return type:
keras.models.Model
- Raises:
NotImplementedError – If the discriminator model is not found.
- property generator: keras.models.Model
Property for accessing the generator model.
- Returns:
The generator model.
- Return type:
keras.models.Model
- Raises:
NotImplementedError – If the generator model is not implemented.
- get() Dict[source]
Retrieves both discriminator and generator models as a dictionary.
- Returns:
A dictionary containing discriminator and generator models.
- Return type:
- Raises:
NotImplementedError – If either discriminator or generator models are not implemented.
- class BaseVAEArchitecture[source]
Base class for defining architectures of Variational Autoencoders (VAEs).
- property decoder: keras.models.Model
Property for accessing the decoder model.
- Returns:
The decoder model.
- Return type:
keras.models.Model
- Raises:
NotImplementedError – If the decoder model is not implemented.
- property encoder: keras.models.Model
Property for accessing the encoder model.
- Returns:
The encoder model.
- Return type:
keras.models.Model
- Raises:
NotImplementedError – If the encoder model is not implemented.
- get() Dict[source]
Retrieves both encoder and decoder models as a dictionary.
- Returns:
A dictionary containing encoder and decoder models.
- Return type:
- Raises:
NotImplementedError – If either encoder or decoder models are not implemented.
- class BasicRecurrentArchitecture(hidden_dim: int, output_dim: int, n_layers: int, network_type: str, name: str = 'Sequential')[source]
Base class for recurrent neural network architectures.
Inherits from Architecture.
- Parameters:
hidden_dim – int, the number of units (e.g. 24)
output_dim – int, the number of output units (e.g. 1)
n_layers – int, the number of layers (e.g. 3)
network_type – str, one of ‘gru’, ‘lstm’, or ‘lstmLN’
name – str, model name Default: “Sequential”
- arch_type = 'rnn_architecture'
- class BlockClfArchitecture(seq_len: int, feat_dim: int, output_dim: int, blocks: list)[source]
Architecture for classification using a sequence of blocks.
Inherits from BaseClassificationArchitecture.
Initializes the BlockClfArchitecture.
- Parameters:
- arch_type = 'downstream:classification'
- class ConvnArchitecture(seq_len: int, feat_dim: int, output_dim: int, n_conv_blocks: int = 1)[source]
Convolutional neural network architecture for classification. Inherits from BaseClassificationArchitecture.
Initializes the convolutional neural network architecture.
- class ConvnLSTMnArchitecture(seq_len: int, feat_dim: int, output_dim: int, n_conv_lstm_blocks: int = 1)[source]
Initializes the base classification architecture.
- class DDPMConvDenoiser(**kwargs)[source]
A convolutional denoising model for DDPM.
This class defines a convolutional neural network architecture used as a denoiser in DDPM. It predicts the noise added to the input samples during the diffusion process.
Initializes the DDPMConvDenoiser model with additional parameters.
- Parameters:
kwargs – Additional keyword arguments to be passed to the parent class.
- arch_type = 'ddpm:denoiser'
- class Sampling(*args: Any, **kwargs: Any)[source]
Custom Keras layer for sampling from a latent space.
This layer samples from a latent space using the reparameterization trick during training. It takes as input the mean and log variance of the latent distribution and generates samples by adding random noise scaled by the standard deviation to the mean.
- call(inputs: Tuple[jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]], jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]]) jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]
Generates samples from a latent space.
- Parameters:
inputs (tuple[tsgm.types.Tensor, tsgm.types.Tensor]) – Tuple containing mean and log variance tensors of the latent distribution.
- Returns:
Sampled latent vector.
- Return type:
tsgm.types.Tensor
- class TransformerClfArchitecture(seq_len: int, feat_dim: int, num_heads: int = 2, ff_dim: int = 64, n_blocks: int = 1, dropout_rate=0.5, output_dim: int = 2)[source]
Base class for transformer architectures.
Inherits from BaseClassificationArchitecture.
Initializes the TransformerClfArchitecture.
- Parameters:
seq_len (int) – Length of input sequences.
feat_dim (int) – Dimensionality of input features.
num_heads (int) – Number of attention heads (default is 2).
ff_dim (int) – Feed forward dimension in the attention block (default is 64).
n_blocks (int) – Number of transformer blocks (default is 1).
dropout_rate (float) – Dropout probability (default is 0.5).
output_dim (int) – Number of classes (default is 2).
- arch_type = 'downstream:classification'
- class VAE_CONV5Architecture(seq_len: int, feat_dim: int, latent_dim: int)[source]
This class defines the architecture for a Variational Autoencoder (VAE) with Convolutional Layers.
- Parameters:
Initializes the VAE_CONV5Architecture.
- Parameters:
- arch_type = 'vae:unconditional'
- class WaveGANArchitecture(seq_len: int, feat_dim: int = 64, latent_dim: int = 32, output_dim: int = 1, kernel_size: int = 32, phase_rad: int = 2, use_batchnorm: bool = False)[source]
WaveGAN architecture, from https://arxiv.org/abs/1802.04208
Inherits from BaseGANArchitecture.
Initializes the WaveGANArchitecture.
- Parameters:
seq_len (int) – Length of input sequences.
feat_dim (int) – Dimensionality of input features.
latent_dim (int) – Dimensionality of the latent space.
output_dim (int) – Dimensionality of the output.
kernel_size (int, optional) – Sizes of convolutions
phase_rad (int, optional) – Phase shuffle radius for wavegan (default is 2)
use_batchnorm (bool, optional) – Whether to use batchnorm (default is False)
- arch_type = 'gan:raw'
- class Zoo(*arg, **kwargs)[source]
A collection of architectures represented. It behaves like supports Python
dictAPI.Initializes the Zoo.
- class cGAN_Conv4Architecture(seq_len: int, feat_dim: int, latent_dim: int, output_dim: int)[source]
Architecture for Conditional Generative Adversarial Network (cGAN) with Convolutional Layers.
Initializes the cGAN_Conv4Architecture.
- Parameters:
- arch_type = 'gan:conditional'
- class cGAN_LSTMConv3Architecture(seq_len: int, feat_dim: int, latent_dim: int, output_dim: int)[source]
Architecture for Conditional Generative Adversarial Network (cGAN) with LSTM and Convolutional Layers.
Initializes the cGAN_LSTMConv3Architecture.
- Parameters:
- arch_type = 'gan:conditional'
- class cGAN_LSTMnArchitecture(seq_len: int, feat_dim: int, latent_dim: int, output_dim: int, n_blocks: int = 1, output_activation: str = 'tanh')[source]
Conditional Generative Adversarial Network (cGAN) with LSTM-based architecture.
Inherits from BaseGANArchitecture.
Initializes the cGAN_LSTMnArchitecture.
- Parameters:
seq_len (int) – Length of input sequences.
feat_dim (int) – Dimensionality of input features.
latent_dim (int) – Dimensionality of the latent space.
output_dim (int) – Dimensionality of the output.
n_blocks (int, optional) – Number of LSTM blocks in the architecture (default is 1).
output_activation (str, optional) – Activation function for the output layer (default is “tanh”).
- arch_type = 'gan:conditional'
- class cVAE_CONV5Architecture(seq_len: int, feat_dim: int, latent_dim: int, output_dim: int = 2)[source]
- arch_type = 'vae:conditional'
Simulators
- class BaseSimulator[source]
Abstract base class for simulators. This class defines the interface for simulators.
- class LotkaVolterraSimulator(data: DatasetProperties, alpha: float = 1, beta: float = 1, gamma: float = 1, delta: float = 1, x0: float = 1, y0: float = 1)[source]
Simulates the Lotka-Volterra equations, which model the dynamics of biological systems in which two species interact, one as a predator and the other as prey.
For the details refer to https://en.wikipedia.org/wiki/Lotka%E2%80%93Volterra_equations
Initializes the Lotka-Volterra simulator with given parameters.
- Parameters:
data (tsgm.dataset.DatasetProperties) – The dataset properties.
alpha (float) – The maximum prey per capita growth rate. Default is 1.
beta (float) – The effect of the presence of predators on the prey death rate. Default is 1.
gamma (float) – The predator’s per capita death rate. Default is 1.
delta (float) – The effect of the presence of prey on the predator’s growth rate. Default is 1.
x0 (float) – The initial population density of prey. Default is 1.
y0 (float) – The initial population density of predator. Default is 1.
- clone() LotkaVolterraSimulator[source]
Creates a deep copy of the current LotkaVolterraSimulator instance.
- Returns:
A new instance of LotkaVolterraSimulator with copied data and parameters.
- Return type:
- generate(num_samples: int, tmax: float = 1)[source]
Generates the simulation data based on the Lotka-Volterra equations.
- set_params(alpha, beta, gamma, delta, x0, y0, **kwargs)[source]
Sets the parameters for the simulator.
- Parameters:
alpha (float) – The maximum prey per capita growth rate.
beta (float) – The effect of the presence of predators on the prey death rate.
gamma (float) – The predator’s per capita death rate.
delta (float) – The effect of the presence of prey on the predator’s growth rate.
x0 (float) – The initial population density of prey.
y0 (float) – The initial population density of predator.
- class ModelBasedSimulator(data: DatasetProperties)[source]
A simulator that is based on a model. This class extends the Simulator class and provides additional methods for handling model parameters.
- Parameters:
data (tsgm.dataset.DatasetProperties) – Properties of the dataset to be used.
- abstract generate(num_samples: int, *args) None[source]
Abstract method to generate a dataset. Must be implemented by subclasses.
- Parameters:
num_samples (int) – Number of samples to generate.
- Raises:
NotImplementedError – This method is not implemented in this class and must be overridden by subclasses.
- class NNSimulator(data: DatasetProperties, driver: Any | None = None)[source]
- Parameters:
data (tsgm.dataset.DatasetProperties) – Properties of the dataset to be used.
driver (Optional[tsgm.types.Model]) – The model to be used for generating data, by default None.
- clone() NNSimulator[source]
Create a deep copy of the simulator.
- Returns:
A deep copy of the current simulator instance.
- Return type:
- class PredictiveMaintenanceSimulator(data: DatasetProperties)[source]
Predictive Maintenance Simulator class that extends the ModelBasedSimulator base class. The simulator is based on https://github.com/AaltoPML/human-in-the-loop-predictive-maintenance From publication: Nikitin, Alexander, and Samuel Kaski. “Human-in-the-loop large-scale predictive maintenance of workstations.” Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022.
Initializes the PredictiveMaintenanceSimulator with dataset properties and sets encoders for categorical features.
- Parameters:
data (tsgm.dataset.DatasetProperties) – Dataset properties for the simulator.
- CAT_FEATURES = [0, 1, 2, 3, 4, 5, 6, 7]
- clone() PredictiveMaintenanceSimulator[source]
Creates a deep copy of the current PredictiveMaintenanceSimulator instance.
- Returns:
A new instance of PredictiveMaintenanceSimulator with copied data and parameters.
- Return type:
- class Simulator(data: DatasetProperties, driver: Any | None = None)[source]
Concrete class for a basic simulator. This class implements the basic methods for fitting a model and generating a dataset, but does not implement the generation and dump methods.
- Parameters:
data (tsgm.dataset.DatasetProperties) – Properties of the dataset to be used.
driver (Optional[tsgm.types.Model]) – The model to be used for generating data, by default None.
- clone() Simulator[source]
Create a deep copy of the simulator.
- Returns:
A deep copy of the current simulator instance.
- Return type:
- dump(path: str, format: str = 'csv') None[source]
Method to save the generated dataset to a file. Not implemented in this class.
- Parameters:
- Raises:
NotImplementedError – This method is not implemented in this class.
- fit(**kwargs) None[source]
Fit the model using the dataset properties.
- Parameters:
kwargs – Additional keyword arguments to pass to the model’s fit method.
- generate(num_samples: int, *args) jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]
Method to generate a dataset. Not implemented in this class.
- Parameters:
num_samples (int) – Number of samples to generate.
- Returns:
The generated dataset.
- Return type:
TensorLike
- Raises:
NotImplementedError – This method is not implemented in this class.
- class SineConstSimulator(data: DatasetProperties, max_scale: float = 10.0, max_const: float = 5.0)[source]
Sine and Constant Function Simulator class that extends the ModelBasedSimulator base class.
- Parameters:
- clone() SineConstSimulator[source]
Creates a deep copy of the current SineConstSimulator instance.
- Returns:
A new instance of SineConstSimulator with copied data and parameters.
- Return type:
Data Processing Utils
- class TSFeatureWiseScaler(feature_range: Tuple[float, float] = (0, 1))[source]
Scales time series data feature-wise.
- Parameters:
feature_range (tuple(float, float)) – Tuple representing the minimum and maximum feature values (default is (0, 1)).
Initializes a new instance of the TSFeatureWiseScaler class.
- Parameters:
feature_range (tuple(float, float)) – Tuple representing the minimum and maximum feature values, defaults to (0, 1).
- fit(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) TSFeatureWiseScaler[source]
Fits the scaler to the data.
- Parameters:
X (TensorLike) – Input data.
- Returns:
The fitted scaler object.
- Return type:
- fit_transform(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]
Fits the scaler to the data and transforms it.
- Parameters:
X (TensorLike) – Input data
- Returns:
Scaled input data X
- Return type:
TensorLike
- class TSGlobalScaler[source]
Scales time series data globally.
- fit(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) TSGlobalScaler[source]
Fits the scaler to the data.
- Parameters:
X (TensorLike) – Input data.
- Returns:
The fitted scaler object.
- Return type:
- fit_transform(X: jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]]) jax.numpy.ndarray | ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]
Fits the scaler to the data and transforms it.
- Parameters:
X (TensorLike) – Input data
- Returns:
Scaled input data X
- Return type:
TensorLike