Statistics (wraquant.stats)¶
Statistical analysis for financial data: descriptive statistics, hypothesis testing, correlation and covariance estimation, distribution fitting, cointegration, regression, factor analysis, and robust statistics.
Submodules:
Descriptive – summary stats, rolling Sharpe, return attribution
Regression – OLS, WLS, rolling OLS, Fama-MacBeth, Newey-West
Correlation – shrunk covariance, distance correlation, mutual information, MST
Distributions – fit distributions, tail index, KDE, Q-Q plot data
Cointegration – Engle-Granger, Johansen, spread, hedge ratio, pairs signals
Tests – normality, stationarity, autocorrelation, heteroskedasticity, structural breaks
Factor analysis – PCA factors, Fama-French, factor loadings, varimax rotation
Robust – MAD, trimmed mean, winsorize, Huber mean, outlier detection
Quick Example¶
from wraquant.stats import summary_stats, test_normality, test_stationarity
# Comprehensive summary statistics
stats = summary_stats(returns)
print(f"Mean: {stats['mean']:.6f}")
print(f"Std: {stats['std']:.4f}")
print(f"Skewness: {stats['skewness']:.4f}")
print(f"Kurtosis: {stats['kurtosis']:.4f}")
# Normality test (JB + Shapiro-Wilk)
norm = test_normality(returns)
print(f"JB p-value: {norm['jarque_bera_pvalue']:.4f}")
# p < 0.05 rejects normality (expected for financial returns)
# Stationarity test (ADF)
stat = test_stationarity(returns)
print(f"ADF p-value: {stat['p_value']:.4f}")
print(f"Stationary: {stat['is_stationary']}")
Cointegration & Pairs Trading¶
from wraquant.stats import (
engle_granger, half_life, spread, zscore_signal,
find_cointegrated_pairs,
)
# Test cointegration between two assets
eg = engle_granger(prices_a, prices_b)
print(f"Cointegrated: {eg['cointegrated']} (p={eg['p_value']:.4f})")
print(f"Hedge ratio: {eg['hedge_ratio']:.4f}")
# Compute the mean-reverting spread
s = spread(prices_a, prices_b, hedge_ratio=eg['hedge_ratio'])
hl = half_life(s)
print(f"Half-life: {hl:.1f} days")
# Generate trading signals from z-score of the spread
signals = zscore_signal(s, entry_z=2.0, exit_z=0.5)
# Scan for cointegrated pairs in a universe
pairs = find_cointegrated_pairs(prices_df, significance=0.05)
for pair in pairs:
print(f"{pair['asset_a']}/{pair['asset_b']}: p={pair['p_value']:.4f}")
Factor Analysis¶
from wraquant.stats import pca_factors, fama_french_regression
# PCA factor decomposition
factors = pca_factors(returns_df, n_factors=3)
print(f"Explained variance: {factors['explained_variance_ratio']}")
# Fama-French regression
ff = fama_french_regression(returns, model="3factor")
print(f"Alpha: {ff['alpha']:.4f} (p={ff['alpha_pvalue']:.3f})")
See also
Risk Management (wraquant.risk) – Risk metrics built on statistical foundations
Time Series (wraquant.ts) – Time series analysis and forecasting
Risk Analysis – Uses statistical tests in risk workflows
API Reference¶
Descriptive Statistics¶
Regression¶
Correlation¶
Distributions¶
Cointegration¶
Cointegration tests and pairs trading utilities for financial data.
- engle_granger(y1, y2, max_lag=None)[source]¶
Engle-Granger two-step cointegration test.
Regresses y1 on y2 via OLS, then tests the residuals for a unit root using the Augmented Dickey-Fuller test.
- Parameters:
- Return type:
- Returns:
Dictionary with
statistic(ADF test statistic),p_value,is_cointegrated(at 5 % significance),hedge_ratio(OLS slope coefficient), andresiduals(pd.Series).
- johansen(data, det_order=0, k_ar_diff=1)[source]¶
Johansen cointegration test for multiple time series.
Requires the
timeseriesoptional dependency group (providesstatsmodels.tsa.vector_ar.vecm).- Parameters:
- Return type:
- Returns:
Dictionary with
trace_stats(array of trace statistics),eigenvalues,critical_values_95(95 % critical values), andn_cointegrating(number of cointegrating relationships at the 5 % level).
- half_life(spread)[source]¶
Estimate the half-life of mean reversion for a spread series.
Fits an OLS regression of the change in spread on the lagged spread level:
delta_spread = phi * spread_lag + eps. The half-life is-log(2) / log(1 + phi).
- spread(y1, y2, hedge_ratio=None)[source]¶
Compute the spread between two price series.
When hedge_ratio is
Noneit is estimated via OLS.- Parameters:
- Return type:
- Returns:
Spread series (
y1 - hedge_ratio * y2).
- zscore_signal(spread, window=20)[source]¶
Compute a rolling z-score of the spread for trading signals.
- hedge_ratio(y1, y2, method='ols')[source]¶
Estimate the hedge ratio between two price series.
- Parameters:
- Return type:
- Returns:
Hedge ratio as a float.
- Raises:
ValueError – If method is not recognized.
- pairs_backtest_signals(spread, entry_z=2.0, exit_z=0.5)[source]¶
Generate pairs trading signals based on z-score thresholds.
The strategy goes short the spread when z-score > entry_z, goes long when z-score < -entry_z, and exits when the z-score crosses back inside
[-exit_z, exit_z].- Parameters:
- Return type:
- Returns:
Signal series with values in
{-1, 0, 1}.1means long the spread,-1means short,0means flat.
- find_cointegrated_pairs(prices_df, significance=0.05)[source]¶
Scan a DataFrame of price series and find all cointegrated pairs.
For each pair of columns, the Engle-Granger cointegration test is applied. Pairs with a p-value below significance are returned.
- Parameters:
- Return type:
- Returns:
List of tuples
(asset1, asset2, p_value, hedge_ratio)for each cointegrated pair, sorted by p-value ascending.