TSGM¶
Datasets¶
- class UCRDataManager(path: str = '/home/docs/checkouts/readthedocs.org/user_builds/tsgm/envs/latest/lib/python3.8/site-packages/tsgm-0.0.7-py3.8.egg/tsgm/utils/../../data', ds: str = 'gunpoint')[source]¶
A manager for UCR collection of time series datasets. If you find these datasets useful, please cite: @misc{UCRArchive2018,
title = {The UCR Time Series Classification Archive}, author = {Dau, Hoang Anh and Keogh, Eamonn and Kamgar, Kaveh and Yeh, Chin-Chia Michael and Zhu, Yan
and Gharghabi, Shaghayegh and Ratanamahatana, Chotirat Ann and Yanping and Hu, Bing and Begum, Nurjahan and Bagnall, Anthony and Mueen, Abdullah and Batista, Gustavo, and Hexagon-ML},
year = {2018}, month = {October}, note = {url{https://www.cs.ucr.edu/~eamonn/time_series_data_2018/}}
}
- Parameters:
path (str) – a relative path to the stored UCR dataset.
ds (str) – Name of the dataset. The list of names is available at https://www.cs.ucr.edu/~eamonn/time_series_data_2018/ (case sensitive!).
- Raises:
ValueError – When there is no stored UCR archive, or the name of the dataset is incorrect.
- default_path = '/home/docs/checkouts/readthedocs.org/user_builds/tsgm/envs/latest/lib/python3.8/site-packages/tsgm-0.0.7-py3.8.egg/tsgm/utils/../../data'¶
- get() Tuple[Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic] [source]¶
Returns a tuple containing training and testing data.
- Returns:
A tuple (X_train, y_train, X_test, y_test).
- Return type:
tuple[TensorLike, TensorLike, TensorLike, TensorLike]
- get_classes_distribution() Dict [source]¶
Returns a dictionary with the fraction of occurrences for each class.
- key = 'someone'¶
- mirrors = ['https://www.cs.ucr.edu/~eamonn/time_series_data_2018/']¶
- resources = [('UCRArchive_2018.zip', 0)]¶
- class UCRDataManager(path: str = '/home/docs/checkouts/readthedocs.org/user_builds/tsgm/envs/latest/lib/python3.8/site-packages/tsgm-0.0.7-py3.8.egg/tsgm/utils/../../data', ds: str = 'gunpoint')[source]¶
A manager for UCR collection of time series datasets. If you find these datasets useful, please cite: @misc{UCRArchive2018,
title = {The UCR Time Series Classification Archive}, author = {Dau, Hoang Anh and Keogh, Eamonn and Kamgar, Kaveh and Yeh, Chin-Chia Michael and Zhu, Yan
and Gharghabi, Shaghayegh and Ratanamahatana, Chotirat Ann and Yanping and Hu, Bing and Begum, Nurjahan and Bagnall, Anthony and Mueen, Abdullah and Batista, Gustavo, and Hexagon-ML},
year = {2018}, month = {October}, note = {url{https://www.cs.ucr.edu/~eamonn/time_series_data_2018/}}
}
- Parameters:
path (str) – a relative path to the stored UCR dataset.
ds (str) – Name of the dataset. The list of names is available at https://www.cs.ucr.edu/~eamonn/time_series_data_2018/ (case sensitive!).
- Raises:
ValueError – When there is no stored UCR archive, or the name of the dataset is incorrect.
- default_path = '/home/docs/checkouts/readthedocs.org/user_builds/tsgm/envs/latest/lib/python3.8/site-packages/tsgm-0.0.7-py3.8.egg/tsgm/utils/../../data'[source]¶
- get() Tuple[Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic] [source]¶
Returns a tuple containing training and testing data.
- Returns:
A tuple (X_train, y_train, X_test, y_test).
- Return type:
tuple[TensorLike, TensorLike, TensorLike, TensorLike]
- get_classes_distribution() Dict [source]¶
Returns a dictionary with the fraction of occurrences for each class.
- y_all: Collection[Hashable] | None¶
- download_physionet2012() None [source]¶
Downloads the Physionet 2012 dataset files from the Physionet website and extracts them in local folder ‘physionet2012’
- gen_sine_const_switch_dataset(N: int, T: int, D: int, max_value: int = 10, const: int = 0, frequency_switch: float = 0.1) Tuple[Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic] [source]¶
Generates a dataset with alternating constant and sinusoidal sequences.
- Parameters:
N (int) – Number of samples in the dataset.
T (int) – Length of each sequence in the dataset.
D (int) – Number of dimensions in each sequence.
max_value (int, optional) – Maximum value for amplitude and shift of the sinusoids. Defaults to 10.
const (int, optional) – Value indicating whether the sequence is constant or sinusoidal. Defaults to 0.
frequency_switch (float, optional) – Probability of switching between constant and sinusoidal sequences. Defaults to 0.1.
- Returns:
Tuple containing input data (X) and target labels (y).
- Return type:
- gen_sine_dataset(N: int, T: int, D: int, max_value: int = 10) ndarray[Any, dtype[ScalarType]] [source]¶
Generates a dataset of sinusoidal waves with random parameters.
- Parameters:
- Returns:
Generated dataset with shape (N, T, D).
- Return type:
- gen_sine_vs_const_dataset(N: int, T: int, D: int, max_value: int = 10, const: int = 0) Tuple[Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic] [source]¶
Generates a dataset with alternating sinusoidal and constant sequences.
- Parameters:
N (int) – Number of samples in the dataset.
T (int) – Length of each sequence in the dataset.
D (int) – Number of dimensions in each sequence.
max_value (int, optional) – Maximum value for amplitude and shift of the sinusoids. Defaults to 10.
const (int, optional) – Maximum value for the constant sequence. Defaults to 0.
- Returns:
Tuple containing input data (X) and target labels (y).
- Return type:
- get_covid_19() Tuple[Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, Tuple, List] [source]¶
Loads Covid-19 dataset with additional graph information The dataset is based on data from The New York Times, based on reports from state and local health agencies [1].
And was adapted to graph case in [2]. [1] The New York Times. (2021). Coronavirus (Covid-19) Data in the United States. Retrieved [Insert Date Here], from https://github.com/nytimes/covid-19-data. [2] Alexander V. Nikitin, St John, Arno Solin, Samuel Kaski Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:10640-10660, 2022.
Returns:¶
- tuple
First element is time series data (n_nodes x n_timestamps x n_features). Each timestamp consists of the number of deaths, cases, deaths normalized by the population, and cases normalized by the population. The second element is the graph tuple (nodes, edges). The third element is the order of states.
- get_eeg() Tuple[Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic] [source]¶
Loads the EEG Eye State dataset.
This function downloads the EEG Eye State dataset from the UCI Machine Learning Repository and returns the input features (X) and target labels (y).
- Returns:
A tuple containing the input features (X) and target labels (y).
- Return type:
tuple[TensorLike, TensorLike]
- get_energy_data() ndarray[Any, dtype[ScalarType]] [source]¶
Retrieves the energy consumption dataset.
This function downloads and loads the energy consumption dataset from the UCI Machine Learning Repository. It returns the dataset as a NumPy array.
- Returns:
Energy consumption dataset.
- Return type:
- get_gp_samples_data(num_samples: int, max_time: int, covar_func: ~typing.Callable = <function _exponential_quadratic>) ndarray[Any, dtype[ScalarType]] [source]¶
Generates samples from a Gaussian process.
This function generates samples from a Gaussian process using the specified covariance function. It returns the generated samples as a NumPy array.
- Parameters:
num_samples (int) – Number of samples to generate.
max_time (int) – Maximum time value for the samples.
covar_func (Callable, optional) – Covariance function to use. Defaults to
_exponential_quadratic
.
- Returns:
Generated samples from the Gaussian process.
- Return type:
- get_mauna_loa() Tuple[Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic] [source]¶
Loads the Mauna Loa CO2 dataset.
This function loads the Mauna Loa CO2 dataset, which contains measurements of atmospheric CO2 concentrations at the Mauna Loa Observatory in Hawaii.
- Returns:
A tuple containing the input data (X) and target labels (y).
- Return type:
tuple[TensorLike, TensorLike]
- get_mnist_data() Tuple[Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic] [source]¶
Retrieves the MNIST dataset.
This function loads the MNIST dataset, which consists of 28x28 grayscale images of handwritten digits, and returns the training and testing data along with their corresponding labels.
- Returns:
A tuple containing the training data, training labels, testing data, and testing labels.
- Return type:
tuple[TensorLike, TensorLike, TensorLike, TensorLike]
- get_physionet2012() Tuple[Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic] [source]¶
Retrieves the Physionet 2012 dataset.
This function downloads and retrieves the Physionet 2012 dataset, which consists of physiological data and corresponding outcomes. It returns the training, testing, and validation datasets along with their labels.
- Returns:
A tuple containing the training, testing, and validation datasets along with their labels. (train_X, train_y, test_X, test_y, val_X, val_y)
- Return type:
tuple[TensorLike, TensorLike, TensorLike, TensorLike, TensorLike, TensorLike]
- get_power_consumption() ndarray[Any, dtype[ScalarType]] [source]¶
Retrieves the household power consumption dataset.
This function downloads and loads the household power consumption dataset from the UCI Machine Learning Repository. It returns the dataset as a NumPy array.
- Returns:
Household power consumption dataset.
- Return type:
- get_stock_data(stock_name: str) ndarray[Any, dtype[ScalarType]] [source]¶
Downloads historical stock data for the specified stock ticker.
This function downloads historical stock data for the specified stock ticker using the Yahoo Finance API. It returns the stock data as a NumPy array with an additional axis representing the batch dimension.
- Parameters:
stock_name (str) – Ticker symbol of the stock.
- Returns:
Historical stock data.
- Return type:
- Raises:
ValueError – If the provided stock ticker is invalid or no data is available.
- get_synchronized_brainwave_dataset() Tuple[DataFrame, DataFrame] [source]¶
Loads the EEG Synchronized Brainwave dataset.
This function downloads the EEG Synchronized Brainwave dataset from dropbox and returns the input features (X) and target labels (y).
- Returns:
A tuple containing the input features (X) and target labels (y).
- Return type:
tuple[pd.DataFrame, pd.DataFrame]
- load_arff(path: str) DataFrame [source]¶
Loads data from an ARFF (Attribute-Relation File Format) file.
This function reads data from an ARFF file located at the specified path and returns it as a pandas DataFrame.
- Parameters:
path (str) – Path to the ARFF file.
- Returns:
DataFrame containing the loaded data.
- Return type:
pandas.DataFrame
- split_dataset_into_objects(X: Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, y: Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, step: int = 10) Tuple[Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic] [source]¶
Splits the dataset into objects of fixed length.
This function splits the input dataset into objects of fixed length along the first dimension, 0-padding if necessary.
Augmentations¶
- class BaseAugmenter(per_feature: bool)[source]¶
- generate(X: Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, y: Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic | None = None, n_samples: int = 1) Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic | Tuple[Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic] [source]¶
- class BaseCompose(augmentations: List[BaseAugmenter])[source]¶
- class DTWBarycentricAveraging[source]¶
- DTW Barycenter Averaging (DBA) [1] method estimated through
Expectation-Maximization algorithm [2] as in https://github.com/tslearn-team/tslearn/
References¶
- generate(X: Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, y: Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic | None = None, n_samples: int = 1, num_initial_samples: int | None = None, initial_timeseries: List[Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic] | None = None, initial_labels: List[int] | None = None, **kwargs) Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic | Tuple[Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic] [source]¶
Parameters¶
X : TensorLike, the timeseries dataset y : TensorLike or None, the classes n_samples : int, number of samples to generate (per class, if y is given) num_initial_samples : int or None (default: None)
The number of timeseries to draw (per class) from the dataset before computing DTW_BA. If None, use the entire set (per class).
- initial_timeseriesarray or None (default: None)
Initial timesteries to start from for the optimization process, with shape (original_size, d). In case y is given, the shape of initial_timeseries is assumed to be (n_classes, original_size, d)
- initial_labels: array or None (default: None)
Labels for samples from
initial_timeseries
Returns¶
- np.array of shape (n_samples, original_size, d) if y is None
or (n_classes * n_samples, original_size, d), and np.array of labels (or None)
- class GaussianNoise(per_feature: bool = True)[source]¶
Apply noise to the input time series. Args:
- variance ((float, float) or float): variance range for noise. If var_limit is a single float, the range
will be (0, var_limit). Default: (10.0, 50.0).
mean (float): mean of the noise. Default: 0 per_feature (bool): if set to True, noise will be sampled for each feature independently.
Otherwise, the noise will be sampled once for all features. Default: True
- generate(X: Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, y: Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic | None = None, n_samples: int = 1, mean: float = 0, variance: float = 1.0) Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic | Tuple[Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic] [source]¶
Generate synthetic data with Gaussian noise.
- Parameters:
X (TensorLike) – Input data tensor of shape (n_data, n_timesteps, n_features).
y (Optional[TensorLike]) – Optional labels tensor. If provided, labels will also be returned
n_samples (int) – Number of augmented samples to generate. Default is 1.
mean (float) – The mean of the noise. Default is 0.
variance (float) – The variance of the noise. Default is 1.0.
- Returns:
Augmented data tensor of shape (n_samples, n_timesteps, n_features) and optionally augmented labels if ‘y’ is provided.
- Return type:
Union[TensorLike, Tuple[TensorLike, TensorLike]]
- class MagnitudeWarping[source]¶
Magnitude warping changes the magnitude of each sample by convolving the data window with a smooth curve varying around one https://dl.acm.org/doi/pdf/10.1145/3136755.3136817
- generate(X: Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, y: Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic | None = None, n_samples: int = 1, sigma: float = 0.2, n_knots: int = 4) Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic | Tuple[Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic] [source]¶
Generates augmented samples via MagnitudeWarping for (X, y)
- Parameters:
X (TensorLike) – Input data tensor of shape (n_data, n_timesteps, n_features).
y (Optional[TensorLike]) – Optional labels tensor. If provided, labels will also be returned
n_samples (int) – Number of augmented samples to generate. Default is 1.
sigma (float) – Standard deviation for the random warping. Default is 0.2.
n_knots (int) – Number of knots used for warping curve. Default is 4.
- Returns:
Augmented data tensor of shape (n_samples, n_timesteps, n_features) and optionally augmented labels if ‘y’ is provided.
- Return type:
Union[TensorLike, Tuple[TensorLike, TensorLike]]
- class Shuffle[source]¶
Shuffles time series features. Shuffling is beneficial when each feature corresponds to interchangeable sensors.
- generate(X: Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, y: Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic | None = None, n_samples: int = 1) Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic | Tuple[Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic] [source]¶
Generate synthetic data using Shuffle strategy. Features are randomly shuffled to generate novel samples.
- Parameters:
X (TensorLike) – Input data tensor of shape (n_data, n_timesteps, n_features).
y (Optional[TensorLike]) – Optional labels tensor. If provided, labels will also be returned
n_samples (int) – Number of augmented samples to generate. Default is 1.
- Returns:
Augmented data tensor of shape (n_samples, n_timesteps, n_features) and optionally augmented labels if ‘y’ is provided.
- Return type:
Union[TensorLike, Tuple[TensorLike, TensorLike]]
- class SliceAndShuffle(per_feature: bool = False)[source]¶
Slice the time series in k pieces and create a new time series by shuffling. Args:
- per_feature (bool): if set to True, each time series is sliced independently.
Otherwise, all features are sliced in the same way. Default: True
- generate(X: Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, y: Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic | None = None, n_samples: int = 1, n_segments: int = 2) Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic | Tuple[Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic] [source]¶
Generate synthetic data using Slice-And-Shuffle strategy. Slices are randomly selected.
- Parameters:
- Returns:
Augmented data tensor of shape (n_samples, n_timesteps, n_features) and optionally augmented labels if ‘y’ is provided.
- Return type:
Union[TensorLike, Tuple[TensorLike, TensorLike]]
- class WindowWarping[source]¶
https://halshs.archives-ouvertes.fr/halshs-01357973/document
- generate(X: Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, y: Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic | None = None, window_ratio: float = 0.2, scales: Tuple = (0.25, 1.0), n_samples: int = 1) Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic | Tuple[Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic, Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic] [source]¶
Generates augmented samples via MagnitudeWarping for (X, y)
- Parameters:
X (TensorLike) – Input data tensor of shape (n_data, n_timesteps, n_features).
y (Optional[TensorLike]) – Optional labels tensor. If provided, labels will also be returned
window_ratio (float) – The ratio of the window size relative to the total number of timesteps. Default is 0.2.
scale (tuple) – A tuple specifying the scale range for warping. Default is (0.25, 1.0).
n_samples (int) – Number of augmented samples to generate. Default is 1.
- Returns:
Augmented data tensor of shape (n_samples, n_timesteps, n_features) and optionally augmented labels if ‘y’ is provided.
- Return type:
Union[TensorLike, Tuple[TensorLike, TensorLike]]
Metrics¶
- class ConsistencyMetric(evaluators: List)[source]¶
Predictive consistency metric measures whether a set of evaluators yield consistent results on real and synthetic data.
- Parameters:
evaluators (list) – A list of evaluators (each item should implement method
.evaluate(D)
)
- class DemographicParityMetric[source]¶
Measuring demographic parity between two datasets.
This metric assesses the difference in the distributions of a target variable among different groups in two datasets. By default, it uses the Kolmogorov-Smirnov statistic to quantify the maximum vertical deviation between the cumulative distribution functions of the target variable for the historical and synthetic data within each group.
- Args:
d_hist (tsgm.dataset.DatasetOrTensor): The historical input dataset or tensor. groups_hist (TensorLike): The group assignments for the historical data. d_synth (tsgm.dataset.DatasetOrTensor): The synthetic input dataset or tensor. groups_synth (TensorLike): The group assignments for the synthetic data. metric (callable, optional): The metric used to compare the target variable distributions within each group.
Default is the Kolmogorov-Smirnov statistic.
- Returns:
dict: A dictionary mapping each group to the computed demographic parity metric.
- Example:
>>> metric = DemographicParityMetric() >>> dataset_hist = tsgm.dataset.Dataset(...) >>> dataset_synth = tsgm.dataset.Dataset(...) >>> groups_hist = [0, 1, 0, 1, 1, 0] >>> groups_synth = [1, 1, 0, 0, 0, 1] >>> result = metric(dataset_hist, groups_hist, dataset_synth, groups_synth) >>> print(result)
- class DiscriminativeMetric[source]¶
The DiscriminativeMetric measures the discriminative performance of a model in distinguishing between synthetic and real datasets.
This metric evaluates a discriminative model by training it on a combination of synthetic and real datasets and assessing its performance on a test set.
- Parameters:
d_hist (tsgm.dataset.DatasetOrTensor) – Real dataset.
d_syn (tsgm.dataset.DatasetOrTensor) – Synthetic dataset.
model (T.Callable) – Discriminative model to be evaluated.
test_size (T.Union[float, int]) – Proportion of the dataset to include in the test split or the absolute number of test samples.
n_epochs (int) – Number of training epochs for the model.
metric (T.Optional[T.Callable]) – Optional evaluation metric to use (default: accuracy).
random_seed (T.Optional[int]) – Optional random seed for reproducibility.
- Returns:
Discriminative performance metric.
- Return type:
Example:¶
>>> from my_module import DiscriminativeMetric, MyDiscriminativeModel >>> import tsgm.dataset >>> import numpy as np >>> import sklearn >>> >>> # Create real and synthetic datasets >>> real_dataset = tsgm.dataset.Dataset(...) # Replace ... with appropriate arguments >>> synthetic_dataset = tsgm.dataset.Dataset(...) # Replace ... with appropriate arguments >>> >>> # Create a discriminative model >>> model = MyDiscriminativeModel() # Replace with the actual discriminative model class >>> >>> # Create and use the DiscriminativeMetric >>> metric = DiscriminativeMetric() >>> result = metric(real_dataset, synthetic_dataset, model, test_size=0.2, n_epochs=10) >>> print(result)
- class DistanceMetric(statistics: list, discrepancy: Callable)[source]¶
Metric that measures similarity between synthetic and real time series
- Parameters:
- discrepancy(stats1: Tensor | ndarray[Any, dtype[ScalarType]], stats2: Tensor | ndarray[Any, dtype[ScalarType]]) float [source]¶
- Parameters:
stats1 (tsgm.types.Tensor.) – A vector of summary statistics.
stats2 (tsgm.types.Tensor.) – A vector of summary statistics.
- Returns:
the distance between two vectors calculated by self._discrepancy.
- class DownstreamPerformanceMetric(evaluator: BaseDownstreamEvaluator)[source]¶
The downstream performance metric evaluates the performance of a model on a downstream task. It returns performance gains achieved with the addition of synthetic data.
- Parameters:
evaluator (BaseDownstreamEvaluator) – An evaluator, should implement method
.evaluate(D)
- class EntropyMetric[source]¶
Calculates the spectral entropy of a dataset or tensor as a sum of individual entropies.
- Args:
d (tsgm.dataset.DatasetOrTensor): The input dataset or tensor.
- Returns:
float: The computed spectral entropy.
- Example:
>>> metric = EntropyMetric() >>> dataset = tsgm.dataset.Dataset(...) >>> result = metric(dataset) >>> print(result)
- class MMDMetric(kernel: ~typing.Callable = <function exp_quad_kernel>)[source]¶
This metric calculated MMD between real and synthetic samples
- Args:
d (tsgm.dataset.DatasetOrTensor): The input dataset or tensor.
- Returns:
float: The computed spectral entropy.
- Example:
>>> metric = MMDMetric(kernel) >>> dataset, synth_dataset = tsgm.dataset.Dataset(...), tsgm.dataset.Dataset(...) >>> result = metric(dataset) >>> print(result)
- class PairwiseDistanceMetric[source]¶
Measures pairwise distances in a set of time series.
- pairwise_euclidean_distances(ts: Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic) Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic [source]¶
Computes the pairwise Euclidean distances for a set of time series.
Parameters: ts (numpy.ndarray): A 2D array where each row represents a time series.
Returns: numpy.ndarray: A 2D array representing the pairwise Euclidean distance matrix.
- class PredictiveParityMetric[source]¶
Measuring predictive parity between two datasets.
This metric assesses the discrepancy in the predictive performance of a model among different groups in two datasets. By default, it uses precision to quantify the predictive performance of the model within each group.
- Args:
y_true_hist (TensorLike): The true target values for the historical data. y_pred_hist (TensorLike): The predicted target values for the historical data. groups_hist (TensorLike): The group assignments for the historical data. y_true_synth (TensorLike): The true target values for the synthetic data. y_pred_synth (TensorLike): The predicted target values for the synthetic data. groups_synth (TensorLike): The group assignments for the synthetic data. metric (callable, optional): The metric used to compare the predictive performance within each group.
Default is precision score.
- Returns:
dict: A dictionary mapping each group to the computed predictive parity metric.
- Example:
>>> metric = PredictiveParityMetric() >>> y_true_hist = [0, 1, 0, 1, 1, 0] >>> y_pred_hist = [0, 1, 0, 0, 1, 1] >>> groups_hist = [0, 1, 0, 1, 1, 0] >>> y_true_synth = [1, 0, 1, 0, 0, 1] >>> y_pred_synth = [1, 0, 1, 1, 0, 0] >>> groups_synth = [1, 1, 0, 0, 0, 1] >>> result = metric(y_true_hist, y_pred_hist, groups_hist, y_true_synth, y_pred_synth, groups_synth) >>> print(result)
- class PrivacyMembershipInferenceMetric(attacker: Any, metric: Callable | None = None)[source]¶
The metric measures the possibility of membership inference attacks.
- Parameters:
attacker (Callable) – An attacker, one class classififier (OCC) that implements methods
.fit
and.predict
metric – Measures quality of attacker (precision by default)
GANs¶
- class ConditionalGAN(*args, **kwargs)[source]¶
Conditional GAN implementation for labeled and temporally labeled time series.
- Parameters:
discriminator (keras.Model) – A discriminator model which takes a time series as input and check whether the sample is real or fake.
generator (keras.Model) – Takes as input a random noise vector of
latent_dim
length and return a simulated time-series.latent_dim (int) – The size of the noise vector.
temporal (bool) – Indicates whether the time series temporally labeled or not.
- compile(d_optimizer: OptimizerV2, g_optimizer: OptimizerV2, loss_fn: Callable) None [source]¶
Compiles the generator and discriminator models.
- Parameters:
d_optimizer (keras.Model) – An optimizer for the GAN’s discriminator.
g_optimizer – An optimizer for the GAN’s generator.
loss_fn (keras.losses.Loss) – Loss function.
- generate(labels: Tensor | ndarray[Any, dtype[ScalarType]]) Tensor | ndarray[Any, dtype[ScalarType]] [source]¶
Generates new data from the model.
- Parameters:
labels (tsgm.types.Tensor) – the number of samples to be generated.
- Returns:
generated samples
- Return type:
tsgm.types.Tensor
- property metrics: List[source]¶
- Returns:
A list of metrics trackers (e.g., generator’s loss and discriminator’s loss).
- Return type:
T.List
- class GAN(*args, **kwargs)[source]¶
GAN implementation for unlabeled time series.
- Parameters:
discriminator (keras.Model) – A discriminator model which takes a time series as input and check whether the sample is real or fake.
generator (keras.Model) – Takes as input a random noise vector of
latent_dim
length and returns a simulated time-series.latent_dim (int) – The size of the noise vector.
use_wgan (bool) – Use Wasserstein GAN with gradien penalty
- compile(d_optimizer: OptimizerV2, g_optimizer: OptimizerV2, loss_fn: Loss) None [source]¶
Compiles the generator and discriminator models.
- Parameters:
d_optimizer (keras.Model) – An optimizer for the GAN’s discriminator.
g_optimizer – An optimizer for the GAN’s generator.
loss_fn (keras.losses.Loss) – Loss function.
- generate(num: int) Tensor | ndarray[Any, dtype[ScalarType]] [source]¶
Generates new data from the model.
- Parameters:
num (int) – the number of samples to be generated.
- Returns:
Generated samples
- Return type:
tsgm.types.Tensor
- property metrics: List[source]¶
- Returns:
A list of metrics trackers (e.g., generator’s loss and discriminator’s loss).
VAEs¶
- class BetaVAE(*args, **kwargs)[source]¶
beta-VAE implementation for unlabeled time series.
- Parameters:
encoder (keras.Model) – An encoder model which takes a time series as input and check whether the image is real or fake.
decoder (keras.Model) – Takes as input a random noise vector of
latent_dim
length and returns a simulated time-series.latent_dim (int) – The size of the noise vector.
- call(X: Tensor | ndarray[Any, dtype[ScalarType]]) Tensor | ndarray[Any, dtype[ScalarType]] [source]¶
Encodes and decodes time series dataset X.
- Parameters:
X (tsgm.types.Tensor) – The size of the noise vector.
- Returns:
Generated samples
- Return type:
tsgm.types.Tensor
- generate(n: int) Tensor | ndarray[Any, dtype[ScalarType]] [source]¶
Generates new data from the model.
- Parameters:
n (int) – the number of samples to be generated.
- Returns:
A tensor with generated samples.
- Return type:
tsgm.types.Tensor
- class cBetaVAE(*args, **kwargs)[source]¶
- call(data: Tensor | ndarray[Any, dtype[ScalarType]]) Tensor | ndarray[Any, dtype[ScalarType]] [source]¶
Encodes and decodes time series dataset X.
- Parameters:
X (tsgm.types.Tensor) – The size of the noise vector.
- Returns:
Generated samples
- Return type:
tsgm.types.Tensor
- generate(labels: Tensor | ndarray[Any, dtype[ScalarType]]) Tuple[Tensor | ndarray[Any, dtype[ScalarType]], Tensor | ndarray[Any, dtype[ScalarType]]] [source]¶
Generates new data from the model.
- Parameters:
labels (tsgm.types.Tensor) – the number of samples to be generated.
- Returns:
a tuple of synthetically generated data and labels.
- Return type:
T.Tuple[tsgm.types.Tensor, tsgm.types.Tensor]
- property metrics: List[source]¶
Returns the list of loss tracker:
[loss, reconstruction_loss, kl_loss]
.
ABC¶
- class RejectionSampler(simulator: ModelBasedSimulator, data: Dataset, statistics: List, epsilon: float, discrepancy: Callable, priors: Dict | None = None, **kwargs)[source]¶
Rejection sampling algorithm for approximate Bayesian computation.
- Parameters:
simulator (class
tsgm.simulator.ModelBasedSimulator
) – A model based simulatordata (class
tsgm.dataset.Dataset
) – Historical dataset storagestatistics (list) – contains a list of summary statistics
epsilon (float) – tolerance of synthetically generated data to a set of summary statistics
discrepancy (Callable) – discrepancy measure function
priors – set of priors for each of the simulator parametors, defaults to DEFAULT_PRIOR
- prior_samples(priors: Dict, params: List) Dict [source]¶
Generate prior samples for the specified parameters.
- Parameters:
priors (T.Dict) – A dictionary containing probability distributions for each parameter. Keys are parameter names, and values are instances of probability distribution classes. If a parameter is not present in the dictionary, a default prior distribution is used.
params (T.List) – A list of parameter names for which prior samples are to be generated.
- Returns:
A dictionary where keys are parameter names and values are samples drawn from their respective prior distributions.
- Return type:
T.Dict
Example:
priors = {'mean': NormalDistribution(0, 1), 'std_dev': UniformDistribution(0, 2)} params = ['mean', 'std_dev'] samples = prior_samples(priors, params)
STS¶
- class STS(model: StructuralTimeSeries | None = None)[source]¶
Class for training and generating from a structural time series model.
Initializes a new instance of the STS class.
- Parameters:
model (tfp.sts.StructuralTimeSeriesModel or None) – Structural time series model to use. If None, default model is used.
- elbo_loss() float [source]¶
Returns the evidence lower bound (ELBO) loss from training.
- Returns:
The value of the ELBO loss.
- Return type:
- generate(num_samples: int) Tensor | ndarray[Any, dtype[ScalarType]] [source]¶
Generates samples from the trained model.
- Parameters:
num_samples (int) – Number of samples to generate.
- Returns:
Generated samples.
- Return type:
tsgm.types.Tensor
- train(ds: Dataset, num_variational_steps: int = 200, steps_forw: int = 10) None [source]¶
Trains the structural time series model.
- Parameters:
ds (tsgm.dataset.Dataset) – Dataset containing time series data.
num_variational_steps (int) – Number of variational optimization steps, defaults to 200.
steps_forw (int) – Number of steps to forecast, defaults to 10.
Visualization¶
- visualize_dataset(dataset: Dataset | Tensor | ndarray[Any, dtype[ScalarType]], obj_id: int = 0, palette: dict = {'gen': 'blue', 'hist': 'red'}, path: str = '/tmp/generated_data.pdf') None [source]¶
The function visualizes time series dataset with target values.
- Parameters:
dataset (tsgm.dataset.DatasetOrTensor.) – A time series dataset.
- visualize_original_and_reconst_ts(original: Tensor | ndarray[Any, dtype[ScalarType]], reconst: Tensor | ndarray[Any, dtype[ScalarType]], num: int = 5, vmin: int = 0, vmax: int = 1) None [source]¶
Visualizes original and reconstructed time series data.
This function generates side-by-side visualizations of the original and reconstructed time series data. It randomly selects a specified number of samples from the input tensors
original
andreconst
and displays them as images using imshow.- Parameters:
original (tsgm.types.Tensor) – Original time series data tensor.
reconst (tsgm.types.Tensor) – Reconstructed time series data tensor.
num (int, optional) – Number of samples to visualize, defaults to 5.
vmin (int, optional) – Minimum value for colormap normalization, defaults to 0.
vmax (int, optional) – Maximum value for colormap normalization, defaults to 1.
- visualize_training_loss(loss_vector: Tensor | ndarray[Any, dtype[ScalarType]], labels: tuple = (), path: str = '/tmp/training_loss.pdf') None [source]¶
Plot training losses as a function of the epochs
- Parameters:
loss_vector – np.array, having shape num of metrics times number of epochs
labels – list of strings
path – str, where to save the plot
- visualize_ts(ts: Tensor | ndarray[Any, dtype[ScalarType]], num: int = 5) None [source]¶
Visualizes time series tensor.
This function generates a plot to visualize time series data. It displays a specified number of time series from the input tensor.
- Parameters:
ts (tsgm.types.Tensor) – The time series data tensor of shape (num_samples, num_timesteps, num_features).
num (int, optional) – The number of time series to display. Defaults to 5.
- Raises:
AssertionError: If the input tensor does not have three dimensions.
- Example:
>>> visualize_ts(time_series_tensor, num=10)
- visualize_ts_lineplot(ts: Tensor | ndarray[Any, dtype[ScalarType]], ys: Tensor | ndarray[Any, dtype[ScalarType]] | None = None, num: int = 5, unite_features: bool = True, legend_fontsize: int = 12, tick_size: int = 10) None [source]¶
Visualizes time series data using line plots.
This function generates line plots to visualize the time series data. It randomly selects a specified number of samples from the input tensor
ts
and plots each sample as a line plot. Ifys
is provided, it can be either a 1D or 2D tensor representing the target variable(s), and the function will optionally overlay it on the line plot.- Parameters:
ts (tsgm.types.Tensor) – Input time series data tensor.
ys (tsgm.types.OptTensor, optional) – Optional target variable(s) tensor, defaults to None.
num (int, optional) – Number of samples to visualize, defaults to 5.
unite_features (bool, optional) – Whether to plot all features together or separately, defaults to True.
legend_fontsize (int, optional) – Font size to use.
tick_size (int, optional) – Font size for y-axis ticks.
- visualize_tsne(X: Tensor | ndarray[Any, dtype[ScalarType]], y: Tensor | ndarray[Any, dtype[ScalarType]], X_gen: Tensor | ndarray[Any, dtype[ScalarType]], y_gen: Tensor | ndarray[Any, dtype[ScalarType]], path: str = '/tmp/tsne_embeddings.pdf', feature_averaging: bool = False, perplexity=30.0) None [source]¶
Visualizes t-SNE embeddings of real and synthetic data.
This function generates a scatter plot of t-SNE embeddings for real and synthetic data. Each data point is represented by a marker on the plot, and the colors of the markers correspond to the corresponding class labels of the data points.
- Parameters:
X (tsgm.types.Tensor) – The original real data tensor of shape (num_samples, num_features).
y (tsgm.types.Tensor) – The labels of the original real data tensor of shape (num_samples,).
X_gen (tsgm.types.Tensor) – The generated synthetic data tensor of shape (num_samples, num_features).
y_gen (tsgm.types.Tensor) – The labels of the generated synthetic data tensor of shape (num_samples,).
path (str, optional) – The path to save the visualization as a PDF file. Defaults to “/tmp/tsne_embeddings.pdf”.
feature_averaging (bool, optional) – Whether to compute the average features for each class. Defaults to False.
- visualize_tsne_unlabeled(X: Tensor | ndarray[Any, dtype[ScalarType]], X_gen: Tensor | ndarray[Any, dtype[ScalarType]], palette: dict = {'gen': 'blue', 'hist': 'red'}, alpha: float = 0.25, path: str = '/tmp/tsne_embeddings.pdf', fontsize: int = 20, markerscale: int = 3, markersize: int = 1, feature_averaging: bool = False, perplexity: float = 30.0) None [source]¶
Visualizes t-SNE embeddings of unlabeled data.
- Parameters:
X (tsgm.types.Tensor) – The original data tensor of shape (num_samples, num_features).
X_gen (tsgm.types.Tensor) – The generated data tensor of shape (num_samples, num_features).
palette (dict, optional) – A dictionary mapping class labels to colors. Defaults to DEFAULT_PALETTE_TSNE.
alpha (float, optional) – The transparency level of the plotted points. Defaults to 0.25.
path (str, optional) – The path to save the visualization as a PDF file. Defaults to “/tmp/tsne_embeddings.pdf”.
fontsize (int, optional) – The font size of the class labels in the legend. Defaults to 20.
markerscale (int, optional) – The scaling factor for the size of the markers in the legend. Defaults to 3.
markersize (int, optional) – The size of the markers in the scatter plot. Defaults to 1.
feature_averaging (bool, optional) – Whether to compute the average features for each class. Defaults to False.
Monitors¶
- class GANMonitor(num_samples: int, latent_dim: int, labels: Tensor | ndarray[Any, dtype[ScalarType]], save: bool = True, save_path: str | None = None, mode: str = 'clf')[source]¶
GANMonitor is a Keras callback for monitoring and visualizing generated samples during training.
- Parameters:
num_samples (int) – The number of samples to generate and visualize.
latent_dim (int) – The dimensionality of the latent space. Defaults to 128.
output_dim (int) – The dimensionality of the output space. Defaults to 2.
save (bool) – Whether to save the generated samples. Defaults to True.
save_path (str) – The path to save the generated samples. Defaults to None.
- Raises:
ValueError – If the mode is not one of [‘clf’, ‘reg’]
- Note:
If
save
is True andsave_path
is not specified, the default save path is “/tmp/”.- Warning:
If
save_path
is specified butsave
is False, a warning is issued.
- class VAEMonitor(num_samples: int = 6, latent_dim: int = 128, output_dim: int = 2, save: bool = True, save_path: str | None = None)[source]¶
VAEMonitor is a Keras callback for monitoring and visualizing generated samples from a Variational Autoencoder (VAE) during training.
- Parameters:
num_samples (int) – The number of samples to generate and visualize. Defaults to 6.
latent_dim (int) – The dimensionality of the latent space. Defaults to 128.
output_dim (int) – The dimensionality of the output space. Defaults to 2.
save (bool) – Whether to save the generated samples. Defaults to True.
save_path (str) – The path to save the generated samples. Defaults to None.
- Raises:
ValueError – If
output_dim
is less than or equal to 0.- Note:
If
save
is True andsave_path
is not specified, the default save path is “/tmp/”.- Warning:
If
save_path
is specified butsave
is False, a warning is issued.
Zoo¶
- class BaseClassificationArchitecture(seq_len: int, feat_dim: int, output_dim: int)[source]¶
Base class for classification architectures.
- Parameters:
Initializes the base classification architecture.
- Parameters:
- class BaseDenoisingArchitecture(seq_len: int, feat_dim: int, n_filters: int = 64, n_conv_layers: int = 3, **kwargs)[source]¶
Base class for denoising architectures in DDPM (Denoising Diffusion Probabilistic Models,
tsgm.models.ddpm
).- Attributes:
arch_type: A string indicating the type of architecture, set to “ddpm:denoising”. _seq_len: The length of the input sequences. _feat_dim: The dimensionality of the input features. _n_filters: The number of filters used in the convolutional layers. _n_conv_layers: The number of convolutional layers in the model. _model: The Keras model instance built using the
_build_model
method.
Initializes the BaseDenoisingArchitecture with the specified parameters.
- Args:
seq_len (int): The length of the input sequences. feat_dim (int): The dimensionality of the input features. n_filters (int, optional): The number of filters for convolutional layers. Default is 64. n_conv_layers (int, optional): The number of convolutional layers. Default is 3. **kwargs: Additional keyword arguments to be passed to the parent class
Architecture
.
- get() Dict [source]¶
Returns a dictionary containing the model.
- Returns:
A dictionary containing the model.
- Return type:
- property model: Model[source]¶
Provides access to the Keras model instance.
- Returns:
keras.models.Model: The Keras model instance built by
_build_model
.
- class BaseGANArchitecture[source]¶
Base class for defining architectures of Generative Adversarial Networks (GANs).
- property discriminator: Model[source]¶
Property for accessing the discriminator model.
- Returns:
The discriminator model.
- Return type:
keras.models.Model
- Raises:
NotImplementedError – If the discriminator model is not found.
- property generator: Model[source]¶
Property for accessing the generator model.
- Returns:
The generator model.
- Return type:
keras.models.Model
- Raises:
NotImplementedError – If the generator model is not implemented.
- get() Dict [source]¶
Retrieves both discriminator and generator models as a dictionary.
- Returns:
A dictionary containing discriminator and generator models.
- Return type:
- Raises:
NotImplementedError – If either discriminator or generator models are not implemented.
- class BaseVAEArchitecture[source]¶
Base class for defining architectures of Variational Autoencoders (VAEs).
- property decoder: Model[source]¶
Property for accessing the decoder model.
- Returns:
The decoder model.
- Return type:
keras.models.Model
- Raises:
NotImplementedError – If the decoder model is not implemented.
- property encoder: Model[source]¶
Property for accessing the encoder model.
- Returns:
The encoder model.
- Return type:
keras.models.Model
- Raises:
NotImplementedError – If the encoder model is not implemented.
- get() Dict [source]¶
Retrieves both encoder and decoder models as a dictionary.
- Returns:
A dictionary containing encoder and decoder models.
- Return type:
- Raises:
NotImplementedError – If either encoder or decoder models are not implemented.
- class BasicRecurrentArchitecture(hidden_dim: int, output_dim: int, n_layers: int, network_type: str, name: str = 'Sequential')[source]¶
Base class for recurrent neural network architectures.
Inherits from Architecture.
- Parameters:
hidden_dim – int, the number of units (e.g. 24)
output_dim – int, the number of output units (e.g. 1)
n_layers – int, the number of layers (e.g. 3)
network_type – str, one of ‘gru’, ‘lstm’, or ‘lstmLN’
name – str, model name Default: “Sequential”
- class BlockClfArchitecture(seq_len: int, feat_dim: int, output_dim: int, blocks: list)[source]¶
Architecture for classification using a sequence of blocks.
Inherits from BaseClassificationArchitecture.
Initializes the BlockClfArchitecture.
- Parameters:
- class ConvnArchitecture(seq_len: int, feat_dim: int, output_dim: int, n_conv_blocks: int = 1)[source]¶
Convolutional neural network architecture for classification. Inherits from BaseClassificationArchitecture.
Initializes the convolutional neural network architecture.
- class ConvnLSTMnArchitecture(seq_len: int, feat_dim: int, output_dim: int, n_conv_lstm_blocks: int = 1)[source]¶
Initializes the base classification architecture.
- class DDPMConvDenoiser(**kwargs)[source]¶
A convolutional denoising model for DDPM.
This class defines a convolutional neural network architecture used as a denoiser in DDPM. It predicts the noise added to the input samples during the diffusion process.
- Attributes:
arch_type: A string indicating the architecture type, set to “ddpm:denoiser”.
Initializes the DDPMConvDenoiser model with additional parameters.
- Args:
**kwargs: Additional keyword arguments to be passed to the parent class.
- class Sampling(*args, **kwargs)[source]¶
Custom Keras layer for sampling from a latent space.
This layer samples from a latent space using the reparameterization trick during training. It takes as input the mean and log variance of the latent distribution and generates samples by adding random noise scaled by the standard deviation to the mean.
- call(inputs: Tuple[Tensor | ndarray[Any, dtype[ScalarType]], Tensor | ndarray[Any, dtype[ScalarType]]]) Tensor | ndarray[Any, dtype[ScalarType]] [source]¶
Generates samples from a latent space.
- Parameters:
inputs (tuple[tsgm.types.Tensor, tsgm.types.Tensor]) – Tuple containing mean and log variance tensors of the latent distribution.
- Returns:
Sampled latent vector.
- Return type:
tsgm.types.Tensor
- class TimeEmbedding(*args, **kwargs)[source]¶
- call(inputs: Tensor | ndarray[Any, dtype[ScalarType]]) Tensor | ndarray[Any, dtype[ScalarType]] [source]¶
This is where the layer’s logic lives.
The
call()
method may not create state (except in its first invocation, wrapping the creation of variables or other resources intf.init_scope()
). It is recommended to create state in__init__()
, or thebuild()
method that is called automatically beforecall()
executes the first time.- Args:
- inputs: Input tensor, or dict/list/tuple of input tensors.
The first positional
inputs
argument is subject to special rules: -inputs
must be explicitly passed. A layer cannot have zeroarguments, and
inputs
cannot be provided via the default value of a keyword argument.NumPy array or Python scalar values in
inputs
get cast as tensors.Keras mask metadata is only collected from
inputs
.Layers are built (
build(input_shape)
method) using shape info frominputs
only.input_spec
compatibility is only checked againstinputs
.Mixed precision input casting is only applied to
inputs
. If a layer has tensor arguments in*args
or**kwargs
, their casting behavior in mixed precision should be handled manually.The SavedModel input specification is generated using
inputs
only.Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for
inputs
and not for tensors in positional and keyword arguments.
- *args: Additional positional arguments. May contain tensors, although
this is not recommended, for the reasons above.
- **kwargs: Additional keyword arguments. May contain tensors, although
this is not recommended, for the reasons above. The following optional keyword arguments are reserved: -
training
: Boolean scalar tensor of Python boolean indicatingwhether the
call
is meant for training or inference.
- Returns:
A tensor or list/tuple of tensors.
- class TransformerClfArchitecture(seq_len: int, feat_dim: int, num_heads: int = 2, ff_dim: int = 64, n_blocks: int = 1, dropout_rate=0.5, output_dim: int = 2)[source]¶
Base class for transformer architectures.
Inherits from BaseClassificationArchitecture.
Initializes the TransformerClfArchitecture.
- Parameters:
seq_len (int) – Length of input sequences.
feat_dim (int) – Dimensionality of input features.
num_heads (int) – Number of attention heads (default is 2).
ff_dim (int) – Feed forward dimension in the attention block (default is 64).
output_dim (int, optional) – Dimensionality of the output.
dropout_rate (float, optional) – Dropout probability (default is 0.5).
n_blocks (int, optional) – Number of transformer blocks (default is 1).
output_dim – Number of classes (default is 2).
- class VAE_CONV5Architecture(seq_len: int, feat_dim: int, latent_dim: int)[source]¶
This class defines the architecture for a Variational Autoencoder (VAE) with Convolutional Layers.
- Parameters:
seq_len (int): Length of input sequence. feat_dim (int): Dimensionality of input features. latent_dim (int): Dimensionality of latent space.
Initializes the VAE_CONV5Architecture.
- Parameters:
- class WaveGANArchitecture(seq_len: int, feat_dim: int = 64, latent_dim: int = 32, output_dim: int = 1, kernel_size: int = 32, phase_rad: int = 2, use_batchnorm: bool = False)[source]¶
WaveGAN architecture, from https://arxiv.org/abs/1802.04208
Inherits from BaseGANArchitecture.
Initializes the WaveGANArchitecture.
- Parameters:
seq_len (int) – Length of input sequences.
feat_dim (int) – Dimensionality of input features.
latent_dim (int) – Dimensionality of the latent space.
output_dim (int) – Dimensionality of the output.
kernel_size (int, optional) – Sizes of convolutions
phase_rad (int, optional) – Phase shuffle radius for wavegan (default is 2)
use_batchnorm (bool, optional) – Whether to use batchnorm (default is False)
- class Zoo(*arg, **kwargs)[source]¶
A collection of architectures represented. It behaves like supports Python
dict
API.Initializes the Zoo.
- class cGAN_Conv4Architecture(seq_len: int, feat_dim: int, latent_dim: int, output_dim: int)[source]¶
Architecture for Conditional Generative Adversarial Network (cGAN) with Convolutional Layers.
Initializes the cGAN_Conv4Architecture.
- Parameters:
- class cGAN_LSTMConv3Architecture(seq_len: int, feat_dim: int, latent_dim: int, output_dim: int)[source]¶
Architecture for Conditional Generative Adversarial Network (cGAN) with LSTM and Convolutional Layers.
Initializes the cGAN_LSTMConv3Architecture.
- Parameters:
- class cGAN_LSTMnArchitecture(seq_len: int, feat_dim: int, latent_dim: int, output_dim: int, n_blocks: int = 1, output_activation: str = 'tanh')[source]¶
Conditional Generative Adversarial Network (cGAN) with LSTM-based architecture.
Inherits from BaseGANArchitecture.
Initializes the cGAN_LSTMnArchitecture.
- Parameters:
seq_len (int) – Length of input sequences.
feat_dim (int) – Dimensionality of input features.
latent_dim (int) – Dimensionality of the latent space.
output_dim (int) – Dimensionality of the output.
n_blocks (int, optional) – Number of LSTM blocks in the architecture (default is 1).
output_activation (str, optional) – Activation function for the output layer (default is “tanh”).
- class cVAE_CONV5Architecture(seq_len: int, feat_dim: int, latent_dim: int, output_dim: int = 2)[source]¶
Simulators¶
- class BaseSimulator[source]¶
Abstract base class for simulators. This class defines the interface for simulators.
Methods¶
- generate(num_samples: int, *args) -> tsgm.dataset.Dataset
Generate a dataset with the specified number of samples.
- dump(path: str, format: str = “csv”) -> None
Save the generated dataset to a file in the specified format.
- class LotkaVolterraSimulator(data: DatasetProperties, alpha: float = 1, beta: float = 1, gamma: float = 1, delta: float = 1, x0: float = 1, y0: float = 1)[source]¶
Simulates the Lotka-Volterra equations, which model the dynamics of biological systems in which two species interact, one as a predator and the other as prey.
For the details refer to https://en.wikipedia.org/wiki/Lotka%E2%80%93Volterra_equations
Initializes the Lotka-Volterra simulator with given parameters.
- Args:
data (tsgm.dataset.DatasetProperties): The dataset properties. alpha (float): The maximum prey per capita growth rate. Default is 1. beta (float): The effect of the presence of predators on the prey death rate. Default is 1. gamma (float): The predator’s per capita death rate. Default is 1. delta (float): The effect of the presence of prey on the predator’s growth rate. Default is 1. x0 (float): The initial population density of prey. Default is 1. y0 (float): The initial population density of predator. Default is 1.
- clone() LotkaVolterraSimulator [source]¶
Creates a deep copy of the current LotkaVolterraSimulator instance.
- Returns:
LotkaVolterraSimulator: A new instance of LotkaVolterraSimulator with copied data and parameters.
- generate(num_samples: int, tmax: float = 1)[source]¶
Generates the simulation data based on the Lotka-Volterra equations.
- Args:
num_samples (int): The number of sample points to generate. tmax (float): The maximum time value for the simulation. Default is 1.
- Returns:
np.ndarray: An array containing the population densities of prey and predators over time.
- set_params(alpha, beta, gamma, delta, x0, y0, **kwargs)[source]¶
Sets the parameters for the simulator.
- Args:
alpha (float): The maximum prey per capita growth rate. beta (float): The effect of the presence of predators on the prey death rate. gamma (float): The predator’s per capita death rate. delta (float): The effect of the presence of prey on the predator’s growth rate. x0 (float): The initial population density of prey. y0 (float): The initial population density of predator. **kwargs: Arbitrary keyword arguments for setting simulator parameters.
- class ModelBasedSimulator(data: DatasetProperties)[source]¶
A simulator that is based on a model. This class extends the Simulator class and provides additional methods for handling model parameters.
Methods¶
- params() -> T.Dict[str, T.Any]
Get a dictionary of the simulator’s parameters.
- set_params(params: T.Dict[str, T.Any]) -> None
Set the simulator’s parameters from a dictionary.
- generate(num_samples: int, *args) -> None
Generate a dataset with the specified number of samples.
Initialize the ModelBasedSimulator with dataset properties.
Parameters¶
- datatsgm.dataset.DatasetProperties
Properties of the dataset to be used.
- abstract generate(num_samples: int, *args) None [source]¶
Abstract method to generate a dataset. Must be implemented by subclasses.
Parameters¶
- num_samplesint
Number of samples to generate.
- *args
Additional arguments to be passed to the method.
Raises¶
- NotImplementedError
This method is not implemented in this class and must be overridden by subclasses.
- class NNSimulator(data: DatasetProperties, driver: Any | None = None)[source]¶
Initialize the Simulator with dataset properties and an optional model.
Parameters¶
- datatsgm.dataset.DatasetProperties
Properties of the dataset to be used.
- driverOptional[tsgm.types.Model], optional
The model to be used for generating data, by default None.
- clone() NNSimulator [source]¶
Create a deep copy of the simulator.
Returns¶
- Simulator
A deep copy of the current simulator instance.
- class PredictiveMaintenanceSimulator(data: DatasetProperties)[source]¶
Predictive Maintenance Simulator class that extends the ModelBasedSimulator base class. The simulator is based on https://github.com/AaltoPML/human-in-the-loop-predictive-maintenance From publication: Nikitin, Alexander, and Samuel Kaski. “Human-in-the-loop large-scale predictive maintenance of workstations.” Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022.
- Attributes:
CAT_FEATURES (list): List of categorical feature indices. encoders (dict): Dictionary of OneHotEncoders for categorical features.
- Methods:
__init__(data): Initializes the simulator with dataset properties and sets encoders. S(lmbd, t): Calculates the survival curve. R(rho, lmbd, t): Calculates the recovery curve parameter. set_params(**kwargs): Sets the parameters for the simulator. mixture_function(a, x): Calculates the mixture function. sample_equipment(num_samples): Samples equipment data and generates the dataset. generate(num_samples): Generates the predictive maintenance dataset. clone() -> PredictiveMaintenanceSimulator: Creates and returns a deep copy of the current simulator.
Initializes the PredictiveMaintenanceSimulator with dataset properties and sets encoders for categorical features.
- Args:
data (tsgm.dataset.DatasetProperties): Dataset properties for the simulator.
- R(rho, lmbd, t)[source]¶
Calculates the recovery curve parameter.
- Args:
rho: Rho parameter for the recovery function. lmbd: Lambda parameter for the exponential distribution. t: Time variable.
- Returns:
float: Recovery curve parameter at time t.
- S(lmbd, t)[source]¶
Calculates the survival curve.
- Args:
lmbd: Lambda parameter for the exponential distribution. t: Time variable.
- Returns:
float: Survival probability at time t.
- clone() PredictiveMaintenanceSimulator [source]¶
Creates a deep copy of the current PredictiveMaintenanceSimulator instance.
- Returns:
PredictiveMaintenanceSimulator: A new instance of PredictiveMaintenanceSimulator with copied data and parameters.
- generate(num_samples: int)[source]¶
Samples equipment data and generates the dataset.
- Args:
num_samples (int): Number of samples to generate.
- Returns:
tuple: A tuple containing the dataset and equipment information.
- mixture_function(a, x)[source]¶
Calculates the mixture function.
- Args:
a: Mixture parameter. x: Input variable.
- Returns:
float: Mixture function value.
- class Simulator(data: DatasetProperties, driver: Any | None = None)[source]¶
Concrete class for a basic simulator. This class implements the basic methods for fitting a model and generating a dataset, but does not implement the generation and dump methods.
Attributes¶
- _datatsgm.dataset.DatasetProperties
Properties of the dataset to be used by the simulator.
- _driverOptional[tsgm.types.Model]
The model to be used for generating data.
Initialize the Simulator with dataset properties and an optional model.
Parameters¶
- datatsgm.dataset.DatasetProperties
Properties of the dataset to be used.
- driverOptional[tsgm.types.Model], optional
The model to be used for generating data, by default None.
- clone() Simulator [source]¶
Create a deep copy of the simulator.
Returns¶
- Simulator
A deep copy of the current simulator instance.
- dump(path: str, format: str = 'csv') None [source]¶
Method to save the generated dataset to a file. Not implemented in this class.
Parameters¶
- pathstr
The file path where the dataset will be saved.
- formatstr, optional
The format in which to save the dataset, by default “csv”.
Raises¶
- NotImplementedError
This method is not implemented in this class.
- fit(**kwargs) None [source]¶
Fit the model using the dataset properties.
Parameters¶
- **kwargs
Additional keyword arguments to pass to the model’s fit method.
- generate(num_samples: int, *args) Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic [source]¶
Method to generate a dataset. Not implemented in this class.
Parameters¶
- num_samplesint
Number of samples to generate.
- *args
Additional arguments to be passed to the method.
Returns¶
- TensorLike
The generated dataset.
Raises¶
- NotImplementedError
This method is not implemented in this class.
- class SineConstSimulator(data: DatasetProperties, max_scale: float = 10.0, max_const: float = 5.0)[source]¶
Sine and Constant Function Simulator class that extends the ModelBasedSimulator base class.
- Attributes:
_scale: TensorFlow probability distribution for scaling factor. _const: TensorFlow probability distribution for constant. _shift: TensorFlow probability distribution for shift.
- Methods:
__init__(data, max_scale=10.0, max_const=5.0): Initializes the simulator with dataset properties and optional parameters. set_params(max_scale, max_const, *args, **kwargs): Sets the parameters for scale, constant, and shift distributions. generate(num_samples, *args) -> tsgm.dataset.Dataset: Generates a dataset based on sine and constant functions. clone() -> SineConstSimulator: Creates and returns a deep copy of the current simulator.
Initializes the SineConstSimulator with dataset properties and optional maximum scale and constant values. Args:
data (tsgm.dataset.DatasetProperties): Dataset properties for the simulator. max_scale (float, optional): Maximum value for the scale parameter. Defaults to 10.0. max_const (float, optional): Maximum value for the constant parameter. Defaults to 5.0.
- clone() SineConstSimulator [source]¶
Creates a deep copy of the current SineConstSimulator instance.
- Returns:
SineConstSimulator: A new instance of SineConstSimulator with copied data and parameters.
Data Processing Utils¶
- class TSFeatureWiseScaler(feature_range: Tuple[float, float] = (0, 1))[source]¶
Scales time series data feature-wise.
Parameters:¶
- feature_rangetuple(float, float), optional
Tuple representing the minimum and maximum feature values (default is (0, 1)).
Attributes:¶
- _min_vfloat
Minimum feature value.
- _max_vfloat
Maximum feature value.
Initializes a new instance of the TSFeatureWiseScaler class.
- parameter feature_range:
Tuple representing the minimum and maximum feature values, defaults to (0, 1)
- type tuple(float, float), optional:
- fit(X: Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic) TSFeatureWiseScaler [source]¶
Fits the scaler to the data.
- Parameters:
X (TensorLike) – Input data.
- Returns:
The fitted scaler object.
- Return type:
- fit_transform(X: Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic) Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic [source]¶
Fits the scaler to the data and transforms it.
- Parameters:
X (TensorLike) – Input data
- Returns:
Scaled input data X
- Return type:
TensorLike
- inverse_transform(X: Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic) Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic [source]¶
Inverse-transforms the data.
- Parameters:
X (TensorLike) – Input data.
- Returns:
Original data.
- Return type:
TensorLike
- transform(X: Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic) Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic [source]¶
Transforms the data.
- Parameters:
X (TensorLike) – Input data.
- Returns:
Scaled X.
- Return type:
TensorLike
- class TSGlobalScaler[source]¶
Scales time series data globally.
Attributes:¶
- minfloat
Minimum value encountered in the data.
- maxfloat
Maximum value encountered in the data.
- fit(X: Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic) TSGlobalScaler [source]¶
Fits the scaler to the data.
- Parameters:
X (TensorLike) – Input data.
- Returns:
The fitted scaler object.
- Return type:
- fit_transform(X: Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic) Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic [source]¶
Fits the scaler to the data and transforms it.
- Parameters:
X (TensorLike) – Input data
- Returns:
Scaled input data X
- Return type:
TensorLike
- inverse_transform(X: Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic) Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic [source]¶
Inverse-transforms the data.
- Parameters:
X (TensorLike) – Input data.
- Returns:
Original data.
- Return type:
TensorLike
- transform(X: Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic) Tensor | TensorProtocol | int | float | bool | str | bytes | complex | tuple | list | ndarray | generic [source]¶
Transforms the data.
- Parameters:
X (TensorLike) – Input data.
- Returns:
Scaled X.
- Return type:
TensorLike