tsgm.metrics

Package Contents

class DistanceMetric(statistics: list, discrepancy: Callable)[source]

Bases: Metric

Metric that measures similarity between synthetic and real time series

Parameters:
  • statistics (list) – A list of summary statistics (callable)

  • discrepancy (Callable) – Discrepancy function, measures the distance between the vectors of summary statistics.

stats(X: tsgm.types.Tensor) tsgm.types.Tensor[source]
Parameters:

X (tsgm.types.Tensor.) – A time series dataset.

Returns:

a tensor with calculated summary statistics.

discrepancy(stats1: tsgm.types.Tensor, stats2: tsgm.types.Tensor) float[source]
Parameters:
  • stats1 (tsgm.types.Tensor.) – A vector of summary statistics.

  • stats2 (tsgm.types.Tensor.) – A vector of summary statistics.

Returns:

the distance between two vectors calculated by self._discrepancy.

__call__(D1: tsgm.dataset.DatasetOrTensor, D2: tsgm.dataset.DatasetOrTensor) float[source]
Parameters:
  • D1 (tsgm.dataset.DatasetOrTensor.) – A time series dataset.

  • D2 (tsgm.dataset.DatasetOrTensor.) – A time series dataset.

Returns:

similarity metric between D1 & D2.

class ConsistencyMetric(evaluators: List)[source]

Bases: Metric

Predictive consistency metric measures whether a set of evaluators yield consistent results on real and synthetic data.

Parameters:

evaluators (list) – A list of evaluators (each item should implement method .evaluate(D))

__call__(D1: tsgm.dataset.DatasetOrTensor, D2: tsgm.dataset.DatasetOrTensor, D_test: tsgm.dataset.DatasetOrTensor) float[source]
Parameters:
  • D1 (tsgm.dataset.DatasetOrTensor.) – A time series dataset.

  • D2 (tsgm.dataset.DatasetOrTensor.) – A time series dataset.

Returns:

consistency metric between D1 & D2.

class BaseDownstreamEvaluator[source]

Bases: abc.ABC

Helper class that provides a standard way to create an ABC using inheritance.

class DownstreamPerformanceMetric(evaluator: BaseDownstreamEvaluator)[source]

Bases: Metric

The downstream performance metric evaluates the performance of a model on a downstream task. It returns performance gains achieved with the addition of synthetic data.

Parameters:

evaluator (BaseDownstreamEvaluator) – An evaluator, should implement method .evaluate(D)

__call__(D1: tsgm.dataset.DatasetOrTensor, D2: tsgm.dataset.DatasetOrTensor, D_test: tsgm.dataset.DatasetOrTensor | None, return_std: bool = False) float[source]
Parameters:
  • D1 (tsgm.dataset.DatasetOrTensor.) – A time series dataset.

  • D2 (tsgm.dataset.DatasetOrTensor.) – A time series dataset.

Returns:

downstream performance metric between D1 & D2.

class PrivacyMembershipInferenceMetric(attacker: Any, metric: Callable | None = None)[source]

Bases: Metric

The metric measures the possibility of membership inference attacks.

Parameters:
  • attacker (Callable) – An attacker, one class classififier (OCC) that implements methods .fit and .predict

  • metric – Measures quality of attacker (precision by default)

__call__(d_tr: tsgm.dataset.Dataset, d_syn: tsgm.dataset.Dataset, d_test: tsgm.dataset.Dataset) float[source]
Parameters:
  • d_tr (tsgm.dataset.DatasetOrTensor.) – Training dataset (the dataset that was used to produce d_dyn).

  • d_syn (tsgm.dataset.DatasetOrTensor.) – Training dataset (the dataset that was used to produce d_dyn).

  • d_test (tsgm.dataset.DatasetOrTensor.) – Training dataset (the dataset that was used to produce d_dyn).

Returns:

how well the attacker can distinguish d_tr & d_test when it is trained on d_syn.

class MMDMetric(kernel: Callable = tsgm.utils.mmd.exp_quad_kernel)[source]

Bases: Metric

This metric calculated MMD between real and synthetic samples

Args:

d (tsgm.dataset.DatasetOrTensor): The input dataset or tensor.

Returns:

float: The computed spectral entropy.

Example:
>>> metric = MMDMetric(kernel)
>>> dataset, synth_dataset = tsgm.dataset.Dataset(...), tsgm.dataset.Dataset(...)
>>> result = metric(dataset)
>>> print(result)
class DiscriminativeMetric[source]

Bases: Metric

The DiscriminativeMetric measures the discriminative performance of a model in distinguishing between synthetic and real datasets.

This metric evaluates a discriminative model by training it on a combination of synthetic and real datasets and assessing its performance on a test set.

Parameters:
  • d_hist (tsgm.dataset.DatasetOrTensor) – Real dataset.

  • d_syn (tsgm.dataset.DatasetOrTensor) – Synthetic dataset.

  • model (T.Callable) – Discriminative model to be evaluated.

  • test_size (T.Union[float, int]) – Proportion of the dataset to include in the test split or the absolute number of test samples.

  • n_epochs (int) – Number of training epochs for the model.

  • metric (T.Optional[T.Callable]) – Optional evaluation metric to use (default: accuracy).

  • random_seed (T.Optional[int]) – Optional random seed for reproducibility.

Returns:

Discriminative performance metric.

Return type:

float

Example:

>>> from my_module import DiscriminativeMetric, MyDiscriminativeModel
>>> import tsgm.dataset
>>> import numpy as np
>>> import sklearn
>>>
>>> # Create real and synthetic datasets
>>> real_dataset = tsgm.dataset.Dataset(...)  # Replace ... with appropriate arguments
>>> synthetic_dataset = tsgm.dataset.Dataset(...)  # Replace ... with appropriate arguments
>>>
>>> # Create a discriminative model
>>> model = MyDiscriminativeModel()  # Replace with the actual discriminative model class
>>>
>>> # Create and use the DiscriminativeMetric
>>> metric = DiscriminativeMetric()
>>> result = metric(real_dataset, synthetic_dataset, model, test_size=0.2, n_epochs=10)
>>> print(result)
class EntropyMetric[source]

Bases: Metric

Calculates the spectral entropy of a dataset or tensor as a sum of individual entropies.

Args:

d (tsgm.dataset.DatasetOrTensor): The input dataset or tensor.

Returns:

float: The computed spectral entropy.

Example:
>>> metric = EntropyMetric()
>>> dataset = tsgm.dataset.Dataset(...)
>>> result = metric(dataset)
>>> print(result)
__call__(d: tsgm.dataset.DatasetOrTensor) float[source]

Calculate the spectral entropy of the input dataset or tensor.

Args:

d (tsgm.dataset.DatasetOrTensor): The input dataset or tensor.

Returns:

float: The computed spectral entropy.

class DemographicParityMetric[source]

Bases: Metric

Measuring demographic parity between two datasets.

This metric assesses the difference in the distributions of a target variable among different groups in two datasets. By default, it uses the Kolmogorov-Smirnov statistic to quantify the maximum vertical deviation between the cumulative distribution functions of the target variable for the historical and synthetic data within each group.

Args:

d_hist (tsgm.dataset.DatasetOrTensor): The historical input dataset or tensor. groups_hist (TensorLike): The group assignments for the historical data. d_synth (tsgm.dataset.DatasetOrTensor): The synthetic input dataset or tensor. groups_synth (TensorLike): The group assignments for the synthetic data. metric (callable, optional): The metric used to compare the target variable distributions within each group.

Default is the Kolmogorov-Smirnov statistic.

Returns:

dict: A dictionary mapping each group to the computed demographic parity metric.

Example:
>>> metric = DemographicParityMetric()
>>> dataset_hist = tsgm.dataset.Dataset(...)
>>> dataset_synth = tsgm.dataset.Dataset(...)
>>> groups_hist = [0, 1, 0, 1, 1, 0]
>>> groups_synth = [1, 1, 0, 0, 0, 1]
>>> result = metric(dataset_hist, groups_hist, dataset_synth, groups_synth)
>>> print(result)
__call__(d_hist: tsgm.dataset.DatasetOrTensor, groups_hist: tensorflow.python.types.core.TensorLike, d_synth: tsgm.dataset.DatasetOrTensor, groups_synth: tensorflow.python.types.core.TensorLike, metric: Callable = _DEFAULT_KS_METRIC) Dict[source]

Calculate the demographic parity metric for the input datasets.

Args:

d_hist (tsgm.dataset.DatasetOrTensor): The historical input dataset or tensor. groups_hist (TensorLike): The group assignments for the historical data. d_synth (tsgm.dataset.DatasetOrTensor): The synthetic input dataset or tensor. groups_synth (TensorLike): The group assignments for the synthetic data. metric (callable, optional): The metric used to compare the target variable distributions within each group.

Default is the Kolmogorov-Smirnov statistic.

Returns:

dict: A dictionary mapping each group to the computed demographic parity metric.

class ShannonEntropyMetric[source]

Bases: Metric

Shannon Entropy calculated over the labels of a dataset. This index is a measure of diversity that accounts for categories present in a dataset.

_shannon_entropy(labels)[source]

Private method to calculate the Shannon Entropy for a given set of labels.

Parameters: labels (array-like): The labels or categories for which the diversity measure is to be calculated.

Returns: float: The Shannon Entropy value.

__call__(d: tsgm.dataset.DatasetOrTensor) float[source]

Calculate the Shannon entropy for the dataset.

Parameters: d (tsgm.dataset.DatasetOrTensor): The dataset or tensor object containing the labels.

Returns: float: The Shannon entropy value.

Raises: AssertionError: If the dataset does not contain labels.

class PairwiseDistanceMetric[source]

Bases: Metric

Measures pairwise distances in a set of time series.

pairwise_euclidean_distances(ts: tensorflow.python.types.core.TensorLike) tensorflow.python.types.core.TensorLike[source]

Computes the pairwise Euclidean distances for a set of time series.

Parameters: ts (numpy.ndarray): A 2D array where each row represents a time series.

Returns: numpy.ndarray: A 2D array representing the pairwise Euclidean distance matrix.

__call__(d: tsgm.dataset.DatasetOrTensor) tensorflow.python.types.core.TensorLike[source]

Calculates the pairwise Euclidean distances for a dataset or tensor.

Parameters: d (tsgm.dataset.DatasetOrTensor): The input dataset or tensor containing time series data.

Returns: float: The pairwise Euclidean distances of the input data.

class PredictiveParityMetric[source]

Measuring predictive parity between two datasets.

This metric assesses the discrepancy in the predictive performance of a model among different groups in two datasets. By default, it uses precision to quantify the predictive performance of the model within each group.

Args:

y_true_hist (TensorLike): The true target values for the historical data. y_pred_hist (TensorLike): The predicted target values for the historical data. groups_hist (TensorLike): The group assignments for the historical data. y_true_synth (TensorLike): The true target values for the synthetic data. y_pred_synth (TensorLike): The predicted target values for the synthetic data. groups_synth (TensorLike): The group assignments for the synthetic data. metric (callable, optional): The metric used to compare the predictive performance within each group.

Default is precision score.

Returns:

dict: A dictionary mapping each group to the computed predictive parity metric.

Example:
>>> metric = PredictiveParityMetric()
>>> y_true_hist = [0, 1, 0, 1, 1, 0]
>>> y_pred_hist = [0, 1, 0, 0, 1, 1]
>>> groups_hist = [0, 1, 0, 1, 1, 0]
>>> y_true_synth = [1, 0, 1, 0, 0, 1]
>>> y_pred_synth = [1, 0, 1, 1, 0, 0]
>>> groups_synth = [1, 1, 0, 0, 0, 1]
>>> result = metric(y_true_hist, y_pred_hist, groups_hist, y_true_synth, y_pred_synth, groups_synth)
>>> print(result)
__call__(y_true_hist: tensorflow.python.types.core.TensorLike, y_pred_hist: tensorflow.python.types.core.TensorLike, groups_hist: tensorflow.python.types.core.TensorLike, y_true_synth: tensorflow.python.types.core.TensorLike, y_pred_synth: tensorflow.python.types.core.TensorLike, groups_synth: tensorflow.python.types.core.TensorLike, metric: Callable = _DEFAULT_METRIC) Dict[int, float][source]

Calculate the predictive parity metric for the input datasets.

Args:

y_true_hist (TensorLike): The true target values for the historical data. y_pred_hist (TensorLike): The predicted target values for the historical data. groups_hist (TensorLike): The group assignments for the historical data. y_true_synth (TensorLike): The true target values for the synthetic data. y_pred_synth (TensorLike): The predicted target values for the synthetic data. groups_synth (TensorLike): The group assignments for the synthetic data. metric (callable, optional): The metric used to compare the predictive performance within each group.

Default is precision score.

Returns:

dict: A dictionary mapping each group to the computed predictive parity metric.