Skip to content

Statistical Utilities

The statistical utilities module provides functions for performing various statistical analyses and computations that are commonly used in the context of spectral normative modeling. Notably, this includes functions complementary to the metrics utilities, such as functions for computing log-likelihoods and centile scores, which are essential for evaluating the fit of normative models and for interpreting the results of empirical data analyses.


spectranorm.utils.stats

utils/stats.py

Statistical utility functions for the Spectranorm package.

compute_censored_log_likelihood(observations: npt.NDArray[np.floating[Any]], predicted_mus: npt.NDArray[np.floating[Any]], predicted_sigmas: npt.NDArray[np.floating[Any]], censored_quantile: float = 0.01) -> npt.NDArray[np.floating[Any]]

Compute censored log likelihood, replacing extreme low likelihoods with a censoring threshold.

Parameters:

Name Type Description Default
observations NDArray[floating[Any]]

np.ndarray Observed data points (N,).

required
predicted_mus NDArray[floating[Any]]

np.ndarray Predicted means for each observation (N,).

required
predicted_sigmas NDArray[floating[Any]]

np.ndarray Predicted standard deviations for each observation (N,).

required
censored_quantile float

float (default=0.01) Quantile below which log-likelihoods are censored.

0.01

Returns:

Type Description
NDArray[floating[Any]]

np.ndarray: The censored log likelihood of all observations.

Source code in src/spectranorm/utils/stats.py
def compute_censored_log_likelihood(
    observations: npt.NDArray[np.floating[Any]],
    predicted_mus: npt.NDArray[np.floating[Any]],
    predicted_sigmas: npt.NDArray[np.floating[Any]],
    censored_quantile: float = 0.01,
) -> npt.NDArray[np.floating[Any]]:
    """
    Compute censored log likelihood, replacing extreme low likelihoods with a censoring
    threshold.

    Args:
        observations: np.ndarray
            Observed data points (N,).
        predicted_mus: np.ndarray
            Predicted means for each observation (N,).
        predicted_sigmas: np.ndarray
            Predicted standard deviations for each observation (N,).
        censored_quantile: float (default=0.01)
            Quantile below which log-likelihoods are censored.

    Returns:
        np.ndarray:
            The censored log likelihood of all observations.
    """
    # Compute log likelihoods
    log_likelihoods = compute_log_likelihood(
        observations,
        predicted_mus,
        predicted_sigmas,
    )

    if censored_quantile <= 0:
        return log_likelihoods

    # Standardized residuals
    z_scores = (observations - predicted_mus) / predicted_sigmas

    # Two-sided z threshold based on standard normal
    z_threshold = stats.norm.ppf(1 - censored_quantile)

    # Replace extreme z's with constant tail mass
    censored_value = np.log(2 * censored_quantile)

    # Apply censoring (two-sided) and return
    return np.where(
        np.abs(z_scores) > z_threshold,
        censored_value,
        log_likelihoods,
    )

compute_centiles_from_z_scores(z_scores: npt.NDArray[np.floating[Any]]) -> npt.NDArray[np.floating[Any]]

Convert z-scores to percentiles.

Parameters:

Name Type Description Default
z_scores NDArray[floating[Any]]

np.ndarray Array of z-scores.

required

Returns:

Type Description
NDArray[floating[Any]]

np.ndarray Array of percentiles corresponding to the z-scores.

Source code in src/spectranorm/utils/stats.py
def compute_centiles_from_z_scores(
    z_scores: npt.NDArray[np.floating[Any]],
) -> npt.NDArray[np.floating[Any]]:
    """
    Convert z-scores to percentiles.

    Args:
        z_scores: np.ndarray
            Array of z-scores.

    Returns:
        np.ndarray
            Array of percentiles corresponding to the z-scores.
    """
    # Convert z-scores to percentiles using the cumulative distribution function (CDF)
    return np.asarray(stats.norm.cdf(z_scores) * 100)

compute_correlation_significance(correlation_matrix: npt.NDArray[np.floating[Any]], n_samples: int, correlation_threshold: float = 0.0, correction_method: str = 'fdr_bh') -> npt.NDArray[np.floating[Any]]

Compute the significance of correlations between variables in the data matrix, thresholded by a specified correlation value, using Fisher's z-transformation.

Parameters:

Name Type Description Default
correlation_matrix NDArray[floating[Any]]

np.ndarray A 2D array of pairwise correlation values.

required
n_samples int

int The number of samples used to compute correlation.

required
correlation_threshold float

float (default=0.0) The correlation threshold above which correlations are to be considered significant.

0.0
correction_method str

str (default='fdr_bh') Method for multiple testing correction. Options include 'bonferroni', 'holm', 'fdr_bh', etc. See statsmodels.stats.multitest.multipletests for more details.

'fdr_bh'

Returns:

Type Description
NDArray[floating[Any]]

np.ndarray A matrix of p-values indicating the significance of each correlation. (Testing if the correlation is significantly greater than the threshold.)

Source code in src/spectranorm/utils/stats.py
def compute_correlation_significance(
    correlation_matrix: npt.NDArray[np.floating[Any]],
    n_samples: int,
    correlation_threshold: float = 0.0,
    correction_method: str = "fdr_bh",
) -> npt.NDArray[np.floating[Any]]:
    """
    Compute the significance of correlations between variables in the data matrix,
    thresholded by a specified correlation value, using Fisher's z-transformation.

    Args:
        correlation_matrix: np.ndarray
            A 2D array of pairwise correlation values.
        n_samples: int
            The number of samples used to compute correlation.
        correlation_threshold: float (default=0.0)
            The correlation threshold above which correlations are to be considered
            significant.
        correction_method: str (default='fdr_bh')
            Method for multiple testing correction. Options include 'bonferroni',
            'holm', 'fdr_bh', etc. See statsmodels.stats.multitest.multipletests for
            more details.

    Returns:
        np.ndarray
            A matrix of p-values indicating the significance of each correlation.
            (Testing if the correlation is significantly greater than the threshold.)
    """
    # set the diagonal to zero to avoid NaNs in arctanh
    np.fill_diagonal(correlation_matrix, 0)

    # Fisher's z-transformation
    fisher_z = np.arctanh(correlation_matrix)
    z_threshold = np.arctanh(correlation_threshold)

    # Standard error of the Fisher z
    se = 1 / np.sqrt(n_samples - 3)

    # Compute the test statistic
    z_score = (np.abs(fisher_z) - z_threshold) / se

    # compute uncorrected p-values for one-tailed test
    p_values = 1 - stats.norm.cdf(z_score)

    # Take upper triangle of the p-value matrix (excluding diagonal)
    triu_indices = np.triu_indices_from(p_values, k=1)
    p_values_triu = p_values[triu_indices]

    # Apply multiple testing correction
    _, corrected_p_values, _, _ = multipletests(
        p_values_triu,
        method=correction_method,
    )

    # Reconstruct the full p-value matrix
    p_values_corrected = np.full_like(p_values, fill_value=np.nan, dtype=np.float64)
    p_values_corrected[triu_indices] = corrected_p_values
    p_values_corrected[(triu_indices[1], triu_indices[0])] = corrected_p_values

    return p_values_corrected

compute_log_likelihood(observations: npt.NDArray[np.floating[Any]], predicted_mus: npt.NDArray[np.floating[Any]], predicted_sigmas: npt.NDArray[np.floating[Any]]) -> npt.NDArray[np.floating[Any]]

Compute the log likelihood of observations given predicted means and standard deviations.

Parameters:

Name Type Description Default
observations NDArray[floating[Any]]

np.ndarray Observed data points.

required
predicted_mus NDArray[floating[Any]]

np.ndarray Predicted means for the observations.

required
predicted_sigmas NDArray[floating[Any]]

np.ndarray Predicted standard deviations for the observations.

required

Returns:

Type Description
NDArray[floating[Any]]

np.ndarray Log likelihood of each observation.

Source code in src/spectranorm/utils/stats.py
def compute_log_likelihood(
    observations: npt.NDArray[np.floating[Any]],
    predicted_mus: npt.NDArray[np.floating[Any]],
    predicted_sigmas: npt.NDArray[np.floating[Any]],
) -> npt.NDArray[np.floating[Any]]:
    """
    Compute the log likelihood of observations given predicted means and standard
    deviations.

    Args:
        observations: np.ndarray
            Observed data points.
        predicted_mus: np.ndarray
            Predicted means for the observations.
        predicted_sigmas: np.ndarray
            Predicted standard deviations for the observations.

    Returns:
        np.ndarray
            Log likelihood of each observation.
    """
    return np.asarray(
        stats.norm.logpdf(observations, loc=predicted_mus, scale=predicted_sigmas),
    )