`tsgm.utils`¶

Submodules¶

tsgm.utils.covid19_data_utils

Package Contents¶

class TSGlobalScaler[source]¶

Scales time series data globally.

Attributes:¶

minfloat: Minimum value encountered in the data.
maxfloat: Maximum value encountered in the data.

fit(X: tensorflow.python.types.core.TensorLike) → TSGlobalScaler[source]¶

Fits the scaler to the data.

Parameters:: X (TensorLike) – Input data.
Returns:: The fitted scaler object.
Return type:: TSGlobalScaler

transform(X: tensorflow.python.types.core.TensorLike) → tensorflow.python.types.core.TensorLike[source]¶

Transforms the data.

Parameters:: X (TensorLike) – Input data.
Returns:: Scaled X.
Return type:: TensorLike

inverse_transform(X: tensorflow.python.types.core.TensorLike) → tensorflow.python.types.core.TensorLike[source]¶

Inverse-transforms the data.

Parameters:: X (TensorLike) – Input data.
Returns:: Original data.
Return type:: TensorLike

fit_transform(X: tensorflow.python.types.core.TensorLike) → tensorflow.python.types.core.TensorLike[source]¶

Fits the scaler to the data and transforms it.

Parameters:: X (TensorLike) – Input data
Returns:: Scaled input data X
Return type:: TensorLike

class TSFeatureWiseScaler(feature_range: Tuple[float, float] = (0, 1))[source]¶

Scales time series data feature-wise.

Parameters:¶

feature_rangetuple(float, float), optional: Tuple representing the minimum and maximum feature values (default is (0, 1)).

Attributes:¶

_min_vfloat: Minimum feature value.
_max_vfloat: Maximum feature value.

Initializes a new instance of the TSFeatureWiseScaler class.

parameter feature_range:: Tuple representing the minimum and maximum feature values, defaults to (0, 1)
type tuple(float, float), optional:

fit(X: tensorflow.python.types.core.TensorLike) → TSFeatureWiseScaler[source]¶

Fits the scaler to the data.

Parameters:: X (TensorLike) – Input data.
Returns:: The fitted scaler object.
Return type:: TSGlobalScaler

transform(X: tensorflow.python.types.core.TensorLike) → tensorflow.python.types.core.TensorLike[source]¶

Transforms the data.

Parameters:: X (TensorLike) – Input data.
Returns:: Scaled X.
Return type:: TensorLike

inverse_transform(X: tensorflow.python.types.core.TensorLike) → tensorflow.python.types.core.TensorLike[source]¶

Inverse-transforms the data.

Parameters:: X (TensorLike) – Input data.
Returns:: Original data.
Return type:: TensorLike

fit_transform(X: tensorflow.python.types.core.TensorLike) → tensorflow.python.types.core.TensorLike[source]¶

Fits the scaler to the data and transforms it.

Parameters:: X (TensorLike) – Input data
Returns:: Scaled input data X
Return type:: TensorLike

visualize_dataset(dataset: tsgm.dataset.DatasetOrTensor, obj_id: int = 0, palette: dict = DEFAULT_PALETTE_TSNE, path: str = '/tmp/generated_data.pdf') → None[source]¶

The function visualizes time series dataset with target values.

Parameters:: dataset (tsgm.dataset.DatasetOrTensor.) – A time series dataset.

visualize_tsne_unlabeled(X: tsgm.types.Tensor, X_gen: tsgm.types.Tensor, palette: dict = DEFAULT_PALETTE_TSNE, alpha: float = 0.25, path: str = '/tmp/tsne_embeddings.pdf', fontsize: int = 20, markerscale: int = 3, markersize: int = 1, feature_averaging: bool = False, perplexity: float = 30.0) → None[source]¶

Visualizes t-SNE embeddings of unlabeled data.

Parameters:

X (tsgm.types.Tensor) – The original data tensor of shape (num_samples, num_features).
X_gen (tsgm.types.Tensor) – The generated data tensor of shape (num_samples, num_features).
palette (dict, optional) – A dictionary mapping class labels to colors. Defaults to DEFAULT_PALETTE_TSNE.
alpha (float, optional) – The transparency level of the plotted points. Defaults to 0.25.
path (str, optional) – The path to save the visualization as a PDF file. Defaults to “/tmp/tsne_embeddings.pdf”.
fontsize (int, optional) – The font size of the class labels in the legend. Defaults to 20.
markerscale (int, optional) – The scaling factor for the size of the markers in the legend. Defaults to 3.
markersize (int, optional) – The size of the markers in the scatter plot. Defaults to 1.
feature_averaging (bool, optional) – Whether to compute the average features for each class. Defaults to False.

visualize_tsne(X: tsgm.types.Tensor, y: tsgm.types.Tensor, X_gen: tsgm.types.Tensor, y_gen: tsgm.types.Tensor, path: str = '/tmp/tsne_embeddings.pdf', feature_averaging: bool = False, perplexity=30.0) → None[source]¶

Visualizes t-SNE embeddings of real and synthetic data.

This function generates a scatter plot of t-SNE embeddings for real and synthetic data. Each data point is represented by a marker on the plot, and the colors of the markers correspond to the corresponding class labels of the data points.

Parameters:

X (tsgm.types.Tensor) – The original real data tensor of shape (num_samples, num_features).
y (tsgm.types.Tensor) – The labels of the original real data tensor of shape (num_samples,).
X_gen (tsgm.types.Tensor) – The generated synthetic data tensor of shape (num_samples, num_features).
y_gen (tsgm.types.Tensor) – The labels of the generated synthetic data tensor of shape (num_samples,).
path (str, optional) – The path to save the visualization as a PDF file. Defaults to “/tmp/tsne_embeddings.pdf”.
feature_averaging (bool, optional) – Whether to compute the average features for each class. Defaults to False.

visualize_ts(ts: tsgm.types.Tensor, num: int = 5) → None[source]¶

Visualizes time series tensor.

This function generates a plot to visualize time series data. It displays a specified number of time series from the input tensor.

Parameters:

ts (tsgm.types.Tensor) – The time series data tensor of shape (num_samples, num_timesteps, num_features).
num (int, optional) – The number of time series to display. Defaults to 5.

Raises:

AssertionError: If the input tensor does not have three dimensions.

Example:

>>> visualize_ts(time_series_tensor, num=10)

visualize_ts_lineplot(ts: tsgm.types.Tensor, ys: tsgm.types.OptTensor = None, num: int = 5, unite_features: bool = True) → None[source]¶

Visualizes time series data using line plots.

This function generates line plots to visualize the time series data. It randomly selects a specified number of samples from the input tensor ts and plots each sample as a line plot. If ys is provided, it can be either a 1D or 2D tensor representing the target variable(s), and the function will optionally overlay it on the line plot.

Parameters:

ts (tsgm.types.Tensor) – Input time series data tensor.
ys (tsgm.types.OptTensor, optional) – Optional target variable(s) tensor, defaults to None.
num (int, optional) – Number of samples to visualize, defaults to 5.
unite_features (bool, optional) – Whether to plot all features together or separately, defaults to True.

visualize_original_and_reconst_ts(original: tsgm.types.Tensor, reconst: tsgm.types.Tensor, num: int = 5, vmin: int = 0, vmax: int = 1) → None[source]¶

Visualizes original and reconstructed time series data.

This function generates side-by-side visualizations of the original and reconstructed time series data. It randomly selects a specified number of samples from the input tensors original and reconst and displays them as images using imshow.

Parameters:

original (tsgm.types.Tensor) – Original time series data tensor.
reconst (tsgm.types.Tensor) – Reconstructed time series data tensor.
num (int, optional) – Number of samples to visualize, defaults to 5.
vmin (int, optional) – Minimum value for colormap normalization, defaults to 0.
vmax (int, optional) – Maximum value for colormap normalization, defaults to 1.

visualize_training_loss(loss_vector: tsgm.types.Tensor, labels: tuple = (), path: str = '/tmp/training_loss.pdf') → None[source]¶

Plot training losses as a function of the epochs

Parameters:

loss_vector – np.array, having shape num of metrics times number of epochs
labels – list of strings
path – str, where to save the plot

gen_sine_dataset(N: int, T: int, D: int, max_value: int = 10) → numpy.typing.NDArray[source]¶

Generates a dataset of sinusoidal waves with random parameters.

Parameters:

N (int) – Number of samples in the dataset.
T (int) – Length of each time series in the dataset.
D (int) – Number of dimensions (sinusoids) in each time series.
max_value (int, optional) – Maximum value for amplitude and shift of the sinusoids. Defaults to 10.

Returns:

Generated dataset with shape (N, T, D).

Return type:

numpy.ndarray

gen_sine_const_switch_dataset(N: int, T: int, D: int, max_value: int = 10, const: int = 0, frequency_switch: float = 0.1) → gen_sine_const_switch_dataset.T[tensorflow.python.types.core.TensorLike, tensorflow.python.types.core.TensorLike][source]¶

Generates a dataset with alternating constant and sinusoidal sequences.

Parameters:

N (int) – Number of samples in the dataset.
T (int) – Length of each sequence in the dataset.
D (int) – Number of dimensions in each sequence.
max_value (int, optional) – Maximum value for amplitude and shift of the sinusoids. Defaults to 10.
const (int, optional) – Value indicating whether the sequence is constant or sinusoidal. Defaults to 0.
frequency_switch (float, optional) – Probability of switching between constant and sinusoidal sequences. Defaults to 0.1.

Returns:

Tuple containing input data (X) and target labels (y).

Return type:

tuple[numpy.ndarray, numpy.ndarray]

gen_sine_vs_const_dataset(N: int, T: int, D: int, max_value: int = 10, const: int = 0) → gen_sine_vs_const_dataset.T[tensorflow.python.types.core.TensorLike, tensorflow.python.types.core.TensorLike][source]¶

Generates a dataset with alternating sinusoidal and constant sequences.

Parameters:

N (int) – Number of samples in the dataset.
T (int) – Length of each sequence in the dataset.
D (int) – Number of dimensions in each sequence.
max_value (int, optional) – Maximum value for amplitude and shift of the sinusoids. Defaults to 10.
const (int, optional) – Maximum value for the constant sequence. Defaults to 0.

Returns:

Tuple containing input data (X) and target labels (y).

Return type:

tuple[numpy.ndarray, numpy.ndarray]

class UCRDataManager(path: str = default_path, ds: str = 'gunpoint')[source]¶

A manager for UCR collection of time series datasets.

Parameters:

path (str) – a relative path to the stored UCR dataset.
ds (str) – Name of the dataset. Should be in (beef | coffee | ecg200 | freezer | gunpoint | insect | mixed_shapes | starlight).

Raises:

ValueError – When there is no stored UCR archive, or the name of the dataset is incorrect.

get() → Tuple[tensorflow.python.types.core.TensorLike, tensorflow.python.types.core.TensorLike, tensorflow.python.types.core.TensorLike, tensorflow.python.types.core.TensorLike][source]¶

Returns a tuple containing training and testing data.

Returns:: A tuple (X_train, y_train, X_test, y_test).
Return type:: tuple[TensorLike, TensorLike, TensorLike, TensorLike]

get_classes_distribution() → Dict[source]¶

Returns a dictionary with the fraction of occurrences for each class.

Returns:: A dictionary containing the fraction of occurrences for each class.
Return type:: dict[Any, float]

summary() → None[source]¶: Prints a summary of the dataset.

get_mauna_loa() → Tuple[tensorflow.python.types.core.TensorLike, tensorflow.python.types.core.TensorLike][source]¶

Loads the Mauna Loa CO2 dataset.

This function loads the Mauna Loa CO2 dataset, which contains measurements of atmospheric CO2 concentrations at the Mauna Loa Observatory in Hawaii.

Returns:: A tuple containing the input data (X) and target labels (y).
Return type:: tuple[TensorLike, TensorLike]

split_dataset_into_objects(X: tensorflow.python.types.core.TensorLike, y: tensorflow.python.types.core.TensorLike, step: int = 10) → Tuple[tensorflow.python.types.core.TensorLike, tensorflow.python.types.core.TensorLike][source]¶

Splits the dataset into objects of fixed length.

This function splits the input dataset into objects of fixed length along the first dimension, 0-padding if necessary.

Parameters:

X (TensorLike) – Input data.
y (TensorLike) – Target labels.
step (int, optional) – Length of each object. Defaults to 10.

Returns:

A tuple containing input data objects and corresponding target label objects.

Return type:

tuple[TensorLike, TensorLike]

load_arff(path: str) → pandas.DataFrame[source]¶

Loads data from an ARFF (Attribute-Relation File Format) file.

This function reads data from an ARFF file located at the specified path and returns it as a pandas DataFrame.

Parameters:: path (str) – Path to the ARFF file.
Returns:: DataFrame containing the loaded data.
Return type:: pandas.DataFrame

get_eeg() → Tuple[tensorflow.python.types.core.TensorLike, tensorflow.python.types.core.TensorLike][source]¶

Loads the EEG Eye State dataset.

This function downloads the EEG Eye State dataset from the UCI Machine Learning Repository and returns the input features (X) and target labels (y).

Returns:: A tuple containing the input features (X) and target labels (y).
Return type:: tuple[TensorLike, TensorLike]

get_power_consumption() → numpy.typing.NDArray[source]¶

Retrieves the household power consumption dataset.

This function downloads and loads the household power consumption dataset from the UCI Machine Learning Repository. It returns the dataset as a NumPy array.

Returns:: Household power consumption dataset.
Return type:: numpy.ndarray

get_stock_data(stock_name: str) → numpy.typing.NDArray[source]¶

Downloads historical stock data for the specified stock ticker.

This function downloads historical stock data for the specified stock ticker using the Yahoo Finance API. It returns the stock data as a NumPy array with an additional axis representing the batch dimension.

Parameters:: stock_name (str) – Ticker symbol of the stock.
Returns:: Historical stock data.
Return type:: numpy.ndarray
Raises:: ValueError – If the provided stock ticker is invalid or no data is available.

get_energy_data() → numpy.typing.NDArray[source]¶

Retrieves the energy consumption dataset.

This function downloads and loads the energy consumption dataset from the UCI Machine Learning Repository. It returns the dataset as a NumPy array.

Returns:: Energy consumption dataset.
Return type:: numpy.ndarray

get_mnist_data() → Tuple[tensorflow.python.types.core.TensorLike, tensorflow.python.types.core.TensorLike, tensorflow.python.types.core.TensorLike, tensorflow.python.types.core.TensorLike][source]¶

Retrieves the MNIST dataset.

This function loads the MNIST dataset, which consists of 28x28 grayscale images of handwritten digits, and returns the training and testing data along with their corresponding labels.

Returns:: A tuple containing the training data, training labels, testing data, and testing labels.
Return type:: tuple[TensorLike, TensorLike, TensorLike, TensorLike]

_exponential_quadratic(x: numpy.typing.NDArray, y: numpy.typing.NDArray) → float[source]¶

This function calculates the exponential quadratic kernel matrix between two sets of points, given by matrices x and y.

Parameters:

x (numpy.ndarray) – First set of points.
y (numpy.ndarray) – Second set of points.

Returns:

Exponential quadratic kernel matrix.

Return type:

numpy.ndarray

get_gp_samples_data(num_samples: int, max_time: int, covar_func: Callable = _exponential_quadratic) → numpy.typing.NDArray[source]¶

Generates samples from a Gaussian process.

This function generates samples from a Gaussian process using the specified covariance function. It returns the generated samples as a NumPy array.

Parameters:

num_samples (int) – Number of samples to generate.
max_time (int) – Maximum time value for the samples.
covar_func (Callable, optional) – Covariance function to use. Defaults to _exponential_quadratic.

Returns:

Generated samples from the Gaussian process.

Return type:

numpy.ndarray

get_physionet2012() → Tuple[tensorflow.python.types.core.TensorLike, tensorflow.python.types.core.TensorLike, tensorflow.python.types.core.TensorLike, tensorflow.python.types.core.TensorLike, tensorflow.python.types.core.TensorLike, tensorflow.python.types.core.TensorLike][source]¶

Retrieves the Physionet 2012 dataset.

This function downloads and retrieves the Physionet 2012 dataset, which consists of physiological data and corresponding outcomes. It returns the training, testing, and validation datasets along with their labels.

Returns:: A tuple containing the training, testing, and validation datasets along with their labels.
Return type:: tuple[TensorLike, TensorLike, TensorLike, TensorLike, TensorLike, TensorLike]

download_physionet2012() → None[source]¶: Downloads the Physionet 2012 dataset files from the Physionet website and extracts them in local folder ‘physionet2012’

_get_physionet_X_dataframe(dataset_path: str) → pandas.DataFrame[source]¶

Reads txt files from folder ‘dataset_path’ and returns a dataframe (X) with the Physionet dataset.

Args:: dataset_path (str): Path to the dataset folder.
Returns:: pd.DataFrame: The features (X) dataframe.

_get_physionet_y_dataframe(file_path: str) → pandas.DataFrame[source]¶

Reads txt files from folder ‘dataset_path’ and returns a dataframe (y) with the Physionet data.

Args:: dataset_path (str): Path to the dataset folder.
Returns:: pd.DataFrame: The target (y) dataframe.

get_covid_19() → Tuple[tensorflow.python.types.core.TensorLike, Tuple, List][source]¶

Loads Covid-19 dataset with additional graph information The dataset is based on data from The New York Times, based on reports from state and local health agencies [1].

And was adapted to graph case in [2]. [1] The New York Times. (2021). Coronavirus (Covid-19) Data in the United States. Retrieved [Insert Date Here], from https://github.com/nytimes/covid-19-data. [2] Alexander V. Nikitin, St John, Arno Solin, Samuel Kaski Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:10640-10660, 2022.

Returns:¶

tuple: First element is time series data (n_nodes x n_timestamps x n_features). Each timestamp consists of the number of deaths, cases, deaths normalized by the population, and cases normalized by the population. The second element is the graph tuple (nodes, edges). The third element is the order of states.

reconstruction_loss_by_axis(original: tensorflow.Tensor, reconstructed: tensorflow.Tensor, axis: int = 0) → tensorflow.Tensor[source]¶

Calculate the reconstruction loss based on a specified axis.

This function computes the reconstruction loss between the original data and the reconstructed data along a specified axis. The loss can be computed in two ways depending on the chosen axis:

When axis is 0, it computes the loss as the sum of squared differences between the original and reconstructed data for all elements.
When axis is 1 or 2, it computes the mean squared error (MSE) between the mean values along the chosen axis for the original and reconstructed data.

Parameters:¶

originaltf.Tensor: The original data tensor.
reconstructedtf.Tensor: The reconstructed data tensor, typically produced by an autoencoder.
axisint, optional (default=0): The axis along which to compute the reconstruction loss: - 0: All elements (sum of squared differences). - 1: Along features (MSE). - 2: Along time steps (MSE).

Returns:¶

tf.Tensor: The computed reconstruction loss as a TensorFlow tensor.

Example:¶

>>> original = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
>>> reconstructed = tf.constant([[1.1, 2.2, 2.9], [3.9, 4.8, 6.1]])
>>> loss = reconstruction_loss_by_axis(original, reconstructed, axis=0)
>>> print(loss.numpy())

Notes:¶

This function is commonly used in the context of autoencoders and other reconstruction-based models to assess the quality of the reconstruction.
The choice of axis determines how the loss is calculated, and it should align with the data’s structure.

fix_seeds(seed_value: int = 42) → None[source]¶

Fix random number generator seeds for reproducibility.

Parameters:¶

seed_valueint, optional (default=42): The seed value to use for fixing the random number generator seeds. This value is used to initialize the random number generators.

Returns:¶

None: This function does not return a value; it modifies the random number generators in-place to fix their seeds.

kernel_median_heuristic(X1: tsgm.types.Tensor, X2: tsgm.types.Tensor) → float[source]¶: Median heuristic (Gretton, 2012) for RBF kernel width.

mmd_diff_var(Kyy: tsgm.types.Tensor, Kzz: tsgm.types.Tensor, Kxy: tsgm.types.Tensor, Kxz: tsgm.types.Tensor) → float[source]¶: Computes the variance of the difference statistic MMD_{XY} - MMD_{XZ} See http://arxiv.org/pdf/1511.04581.pdf Appendix A for more details.

mmd_3_test(X: tsgm.types.Tensor, Y: tsgm.types.Tensor, Z: tsgm.types.Tensor, kernel: Callable) → Tuple[float, float, float, float][source]¶: Relative MMD test — returns a test statistic for whether Y is closer to X or than Z. See http://arxiv.org/pdf/1511.04581.pdf

tsgm.utils¶

Submodules¶

Package Contents¶

Attributes:¶

Parameters:¶

Attributes:¶

Returns:¶

Parameters:¶

Returns:¶

Example:¶

Notes:¶

Parameters:¶

Returns:¶

`tsgm.utils`¶