Statistics (wraquant.stats)

Statistical analysis for financial data: descriptive statistics, hypothesis testing, correlation and covariance estimation, distribution fitting, cointegration, regression, factor analysis, and robust statistics.

Submodules:

  • Descriptive – summary stats, rolling Sharpe, return attribution

  • Regression – OLS, WLS, rolling OLS, Fama-MacBeth, Newey-West

  • Correlation – shrunk covariance, distance correlation, mutual information, MST

  • Distributions – fit distributions, tail index, KDE, Q-Q plot data

  • Cointegration – Engle-Granger, Johansen, spread, hedge ratio, pairs signals

  • Tests – normality, stationarity, autocorrelation, heteroskedasticity, structural breaks

  • Factor analysis – PCA factors, Fama-French, factor loadings, varimax rotation

  • Robust – MAD, trimmed mean, winsorize, Huber mean, outlier detection

Quick Example

from wraquant.stats import summary_stats, test_normality, test_stationarity

# Comprehensive summary statistics
stats = summary_stats(returns)
print(f"Mean:     {stats['mean']:.6f}")
print(f"Std:      {stats['std']:.4f}")
print(f"Skewness: {stats['skewness']:.4f}")
print(f"Kurtosis: {stats['kurtosis']:.4f}")

# Normality test (JB + Shapiro-Wilk)
norm = test_normality(returns)
print(f"JB p-value: {norm['jarque_bera_pvalue']:.4f}")
# p < 0.05 rejects normality (expected for financial returns)

# Stationarity test (ADF)
stat = test_stationarity(returns)
print(f"ADF p-value: {stat['p_value']:.4f}")
print(f"Stationary: {stat['is_stationary']}")

Cointegration & Pairs Trading

from wraquant.stats import (
    engle_granger, half_life, spread, zscore_signal,
    find_cointegrated_pairs,
)

# Test cointegration between two assets
eg = engle_granger(prices_a, prices_b)
print(f"Cointegrated: {eg['cointegrated']} (p={eg['p_value']:.4f})")
print(f"Hedge ratio: {eg['hedge_ratio']:.4f}")

# Compute the mean-reverting spread
s = spread(prices_a, prices_b, hedge_ratio=eg['hedge_ratio'])
hl = half_life(s)
print(f"Half-life: {hl:.1f} days")

# Generate trading signals from z-score of the spread
signals = zscore_signal(s, entry_z=2.0, exit_z=0.5)

# Scan for cointegrated pairs in a universe
pairs = find_cointegrated_pairs(prices_df, significance=0.05)
for pair in pairs:
    print(f"{pair['asset_a']}/{pair['asset_b']}: p={pair['p_value']:.4f}")

Factor Analysis

from wraquant.stats import pca_factors, fama_french_regression

# PCA factor decomposition
factors = pca_factors(returns_df, n_factors=3)
print(f"Explained variance: {factors['explained_variance_ratio']}")

# Fama-French regression
ff = fama_french_regression(returns, model="3factor")
print(f"Alpha: {ff['alpha']:.4f} (p={ff['alpha_pvalue']:.3f})")

See also

API Reference

Descriptive Statistics

Regression

Correlation

Distributions

Cointegration

Cointegration tests and pairs trading utilities for financial data.

engle_granger(y1, y2, max_lag=None)[source]

Engle-Granger two-step cointegration test.

Regresses y1 on y2 via OLS, then tests the residuals for a unit root using the Augmented Dickey-Fuller test.

Parameters:
  • y1 (Series) – First price series.

  • y2 (Series) – Second price series.

  • max_lag (int | None, default: None) – Maximum number of lags for the ADF test. When None, adfuller selects lags automatically via AIC.

Return type:

dict

Returns:

Dictionary with statistic (ADF test statistic), p_value, is_cointegrated (at 5 % significance), hedge_ratio (OLS slope coefficient), and residuals (pd.Series).

johansen(data, det_order=0, k_ar_diff=1)[source]

Johansen cointegration test for multiple time series.

Requires the timeseries optional dependency group (provides statsmodels.tsa.vector_ar.vecm).

Parameters:
  • data (DataFrame) – DataFrame of price series (columns = assets).

  • det_order (int, default: 0) – Deterministic term order. -1 for no deterministic term, 0 for constant, 1 for linear trend.

  • k_ar_diff (int, default: 1) – Number of lagged differences in the model.

Return type:

dict

Returns:

Dictionary with trace_stats (array of trace statistics), eigenvalues, critical_values_95 (95 % critical values), and n_cointegrating (number of cointegrating relationships at the 5 % level).

half_life(spread)[source]

Estimate the half-life of mean reversion for a spread series.

Fits an OLS regression of the change in spread on the lagged spread level: delta_spread = phi * spread_lag + eps. The half-life is -log(2) / log(1 + phi).

Parameters:

spread (Series) – Spread (or residual) series.

Return type:

float

Returns:

Half-life in the same time units as the spread index. Returns float('inf') when the spread is not mean-reverting.

spread(y1, y2, hedge_ratio=None)[source]

Compute the spread between two price series.

When hedge_ratio is None it is estimated via OLS.

Parameters:
  • y1 (Series) – First price series (the dependent variable).

  • y2 (Series) – Second price series (the independent variable).

  • hedge_ratio (float | None, default: None) – Explicit hedge ratio. If None, the ratio is estimated from the data using OLS.

Return type:

Series

Returns:

Spread series (y1 - hedge_ratio * y2).

zscore_signal(spread, window=20)[source]

Compute a rolling z-score of the spread for trading signals.

Parameters:
  • spread (Series) – Spread series.

  • window (int, default: 20) – Rolling window size for mean and standard deviation.

Return type:

Series

Returns:

Rolling z-score series.

hedge_ratio(y1, y2, method='ols')[source]

Estimate the hedge ratio between two price series.

Parameters:
  • y1 (Series) – First (dependent) price series.

  • y2 (Series) – Second (independent) price series.

  • method (str, default: 'ols') – Estimation method — "ols" for ordinary least squares or "tls" for total least squares (orthogonal regression).

Return type:

float

Returns:

Hedge ratio as a float.

Raises:

ValueError – If method is not recognized.

pairs_backtest_signals(spread, entry_z=2.0, exit_z=0.5)[source]

Generate pairs trading signals based on z-score thresholds.

The strategy goes short the spread when z-score > entry_z, goes long when z-score < -entry_z, and exits when the z-score crosses back inside [-exit_z, exit_z].

Parameters:
  • spread (Series) – Spread series (raw, not z-scored).

  • entry_z (float, default: 2.0) – Z-score threshold for entry (absolute value).

  • exit_z (float, default: 0.5) – Z-score threshold for exit (absolute value).

Return type:

Series

Returns:

Signal series with values in {-1, 0, 1}. 1 means long the spread, -1 means short, 0 means flat.

find_cointegrated_pairs(prices_df, significance=0.05)[source]

Scan a DataFrame of price series and find all cointegrated pairs.

For each pair of columns, the Engle-Granger cointegration test is applied. Pairs with a p-value below significance are returned.

Parameters:
  • prices_df (DataFrame) – DataFrame of price series (columns = asset names).

  • significance (float, default: 0.05) – Significance level for the cointegration test.

Return type:

list[tuple]

Returns:

List of tuples (asset1, asset2, p_value, hedge_ratio) for each cointegrated pair, sorted by p-value ascending.

Statistical Tests

Factor Analysis

Factor Models

Dependence

Robust Statistics