Econometrics (wraquant.econometrics)¶
Econometric methods for financial research: panel data models, IV/2SLS, event studies, structural breaks, and cross-sectional analysis.
Quick Example¶
from wraquant.econometrics import panel, event_study
# Fixed effects panel regression
result = panel.fixed_effects(y, X, entity_id="ticker", time_id="date")
print(f"Coefficients: {result['coefficients']}")
print(f"R-squared: {result['r_squared']:.4f}")
# Event study: abnormal returns around an event
es = event_study.event_study(
returns=returns,
event_date="2023-03-10",
estimation_window=(-252, -21),
event_window=(-5, 10),
)
print(f"CAR: {es['car']:.4f}")
print(f"CAR t-stat: {es['t_stat']:.4f}")
See also
Statistics (wraquant.stats) – Regression and statistical testing
Causal Inference (wraquant.causal) – Causal inference methods (DID, synthetic control)
API Reference¶
Econometric methods for quantitative finance.
Provides rigorous econometric estimators and tests grounded in the academic finance literature. Covers panel data methods for multi-firm studies, cross-sectional regressions for asset pricing tests, time series models for macro-financial linkages, volatility modeling via the GARCH family, regression diagnostics for model validation, and event study methodology for measuring abnormal returns around corporate actions.
Key sub-modules:
Panel data (
panel) –fixed_effects,random_effects,pooled_ols,between_effects,first_difference, andhausman_testfor choosing between FE and RE. Use for multi-firm, multi-period studies (e.g., Fama-MacBeth cross-sections, earnings announcements across firms).Cross-section (
cross_section) –robust_ols(White or HAC standard errors),quantile_regression,two_stage_least_squares(IV/2SLS for endogeneity),gmm_estimation, andsargan_test(over-identification).Time series (
timeseries) –var_model(Vector Autoregression),vecm_model(Vector Error Correction for cointegrated systems),granger_causality,impulse_response,variance_decomposition, andstructural_break_test.Volatility (
volatility) –garch,egarch,gjr_garch,dcc_garch,arch_test(test for ARCH effects), andgarch_numpy_fallback(pure-numpy fallback when arch is unavailable).Diagnostics (
diagnostics) –durbin_watson,breusch_godfrey(serial correlation),breusch_pagan/white_test(heteroskedasticity),jarque_bera(normality),ramsey_reset(functional form),vif(multicollinearity), andcondition_number.Event study (
event_study) –event_study(market model estimation),cumulative_abnormal_return(CAR), andbuy_and_hold_abnormal_return(BHAR).
Example
>>> from wraquant.econometrics import fixed_effects, granger_causality
>>> fe_result = fixed_effects(panel_df, y="returns", x=["beta", "size"])
>>> gc = granger_causality(gdp_growth, sp500_returns, max_lag=4)
Use wraquant.econometrics for formal econometric analysis (panel
regressions, IV estimation, event studies, VAR/VECM). For simpler OLS
and rolling regression, see wraquant.stats.regression. For GARCH
with full diagnostics and forecasting, prefer wraquant.vol.
- robust_ols(y, X, cov_type='HC1')[source]¶
OLS regression with heteroskedasticity-robust standard errors.
Implements the HC0 – HC3 covariance estimators of White (1980) and MacKinnon and White (1985). Point estimates are identical to plain OLS; only the standard errors, t-statistics, and p-values differ.
- Parameters:
- Return type:
- Returns:
Dictionary with
coefficients,std_errors,t_stats,p_values,r_squared,adj_r_squared,residuals,cov_type, andnobs.
- quantile_regression(y, X, quantile=0.5)[source]¶
Quantile regression via statsmodels.
Estimates the conditional quantile function, generalising OLS which estimates the conditional mean. Useful for understanding heterogeneous effects across the return distribution.
- Parameters:
- Return type:
- Returns:
Dictionary with
coefficients,std_errors,t_stats,p_values,quantile,pseudo_r_squared, andnobs.
- two_stage_least_squares(y, X, instruments, endog_vars)[source]¶
Two-stage least squares (2SLS) instrumental variables estimator.
Addresses endogeneity by instrumenting the endogenous regressors in X with exogenous instruments. The first stage regresses each endogenous variable on the instruments, and the second stage regresses y on the fitted values plus the exogenous regressors.
- Parameters:
X (
ndarray|DataFrame) – Full design matrix (n, k) containing both exogenous and endogenous regressors.instruments (
ndarray|DataFrame) – Matrix of excluded instruments (n, m) with m >= number of endogenous variables (order condition).endog_vars (
list[int] |ndarray) – Column indices (0-based) within X that are endogenous.
- Return type:
- Returns:
Dictionary with
coefficients,std_errors,t_stats,p_values,r_squared,residuals,first_stage_f, andnobs.- Raises:
ValueError – If the order condition is violated (fewer instruments than endogenous variables).
- gmm_estimation(moment_conditions, params_init, W=None, *, data=None, max_iter=2)[source]¶
Generalized Method of Moments (GMM) estimation.
Minimises the quadratic form g(theta)’ W g(theta) where g(theta) is the sample average of moment conditions. Supports iterated GMM (two-step by default).
- Parameters:
moment_conditions (
Callable[[ndarray,ndarray],ndarray]) – Callable(params, data) -> (n, q)returning the n-by-q matrix of moment conditions evaluated at params. Each row corresponds to an observation, each column to a moment condition.params_init (
ndarray) – Initial parameter vector (p,).W (
ndarray|None, default:None) – Weighting matrix (q, q). Defaults to the identity matrix (one-step GMM). After the first step the optimal weighting matrix is computed from the residual moment conditions.data (
ndarray|None, default:None) – Data array passed as the second argument to moment_conditions. IfNone, a zero-length array is used.max_iter (
int, default:2) – Maximum number of GMM iterations (default 2 = two-step GMM).
- Return type:
- Returns:
Dictionary with
params,objective,moment_conditions_mean,W,nobs, andn_moments.
- sargan_test(residuals, instruments)[source]¶
Sargan-Hansen J-test for overidentifying restrictions.
Tests whether excluded instruments are validly uncorrelated with the structural error. Only applicable when the model is overidentified (more instruments than endogenous variables).
- Parameters:
- Return type:
- Returns:
Dictionary with
statistic,p_value,df, andis_valid(instruments are valid at 5 % level if not rejected).
- durbin_watson(residuals)[source]¶
Compute the Durbin-Watson statistic for serial correlation.
The statistic ranges from 0 to 4. A value near 2 indicates no first-order autocorrelation, values near 0 indicate positive autocorrelation, and values near 4 indicate negative autocorrelation.
Delegates to
wraquant.stats.tests.durbin_watsonfor the core computation.
- breusch_godfrey(residuals, X, nlags=4)[source]¶
Breusch-Godfrey LM test for serial correlation.
Regresses the residuals on the original regressors plus lagged residuals. A significant test statistic rejects the null of no serial correlation up to order nlags.
- Parameters:
- Return type:
- Returns:
Dictionary with
lm_statistic,lm_p_value,f_statistic,f_p_value, andis_autocorrelated(at 5 % level).
- breusch_pagan(residuals, X)[source]¶
Breusch-Pagan test for heteroskedasticity.
Tests whether the variance of the residuals depends on the regressors. The null hypothesis is homoskedasticity. Delegates to
wraquant.stats.tests.breusch_paganfor the core computation.
- white_test(residuals, X)[source]¶
White’s general test for heteroskedasticity.
Includes cross-product terms and squared regressors, making it more general than Breusch-Pagan. The null hypothesis is homoskedasticity. Delegates to
wraquant.stats.tests.white_testfor the core computation.
- jarque_bera(residuals)[source]¶
Jarque-Bera test for normality of residuals.
The null hypothesis is that the residuals are normally distributed. Delegates to
wraquant.stats.tests.test_normalityfor the core computation.
- ramsey_reset(y, X, power=3)[source]¶
Ramsey RESET test for functional form misspecification.
Adds powers of the fitted values to the regression and tests their joint significance. Rejection suggests the linear specification is inadequate.
- Parameters:
- Return type:
- Returns:
Dictionary with
f_statistic,p_value,df_num,df_denom, andis_misspecified(at 5 % level).
- vif(X)[source]¶
Compute Variance Inflation Factors for each regressor.
VIF > 10 is a common rule-of-thumb threshold indicating problematic multicollinearity. The input should not include a constant column.
- condition_number(X)[source]¶
Compute the condition number of the X’X matrix.
A condition number exceeding 30 suggests harmful multicollinearity (Belsley, Kuh, and Welsch, 1980).
- event_study(returns, event_dates, estimation_window=(-250, -10), event_window=(-5, 5), market_returns=None)[source]¶
Classic event study with market-model abnormal returns.
For each event date, estimates a market model over the estimation_window and computes abnormal returns (AR) and cumulative abnormal returns (CAR) over the event_window.
If market_returns is
None, a constant-mean-return model is used instead of the market model.- Parameters:
returns (
DataFrame|Series) – Return series for the security (or DataFrame with one column per security). Must have a DatetimeIndex.event_dates (
list|DatetimeIndex) – List of event dates.estimation_window (
tuple[int,int], default:(-250, -10)) –(start, end)offsets in trading days relative to the event date for the estimation period. Default(-250, -10).event_window (
tuple[int,int], default:(-5, 5)) –(start, end)offsets for the event window. Default(-5, 5).market_returns (
Series|None, default:None) – Market return series (must overlap in time with returns).
- Returns:
abnormal_returns: DataFrame (n_events, event_window_len) of AR.car: Series of cumulative abnormal returns for each event.mean_car: Average CAR across events.t_stat: Cross-sectional t-statistic for mean CAR.p_value: Two-sided p-value.event_dates: The event dates used.n_events: Number of valid events.
- Return type:
- cumulative_abnormal_return(returns, expected_returns, event_window=None)[source]¶
Compute cumulative abnormal returns (CAR).
- Parameters:
- Return type:
- Returns:
Series of cumulative abnormal returns.
- buy_and_hold_abnormal_return(returns, benchmark_returns, event_window=None)[source]¶
Buy-and-hold abnormal return (BHAR).
BHAR = product(1 + R_i) - product(1 + R_benchmark)
Unlike CAR, BHAR accounts for compounding and is preferred for longer-horizon event studies (Barber and Lyon, 1997).
- Parameters:
- Return type:
- Returns:
BHAR as a float.
- pooled_ols(y, X)[source]¶
Pooled OLS regression ignoring panel structure.
Treats all observations as independent cross-sectional data. This is generally inconsistent when unobserved heterogeneity is present but serves as a baseline for comparison with panel estimators.
- Parameters:
- Return type:
- Returns:
Dictionary with
coefficients,std_errors,t_stats,p_values,r_squared,adj_r_squared,residuals, andnobs.
- fixed_effects(y, X, entity_col, time_col=None)[source]¶
Entity fixed effects regression (within estimator).
Demeans all variables by entity (and optionally by time period) before running OLS, which is algebraically equivalent to including entity dummies. This removes time-invariant unobserved heterogeneity.
- Parameters:
y (
Series) – Dependent variable. Must share the same index as X.X (
DataFrame) – Regressor DataFrame. Must contain entity_col (and optionally time_col). All other columns are treated as regressors.entity_col (
str) – Column name identifying the cross-sectional unit.time_col (
str|None, default:None) – Optional column name identifying the time period for two-way fixed effects.
- Return type:
- Returns:
Dictionary with
coefficients,std_errors,t_stats,p_values,r_squared(within),entity_effects,residuals,nobs, andn_entities.
- random_effects(y, X, entity_col)[source]¶
Random effects (GLS) panel regression.
Assumes that the unobserved entity effect is uncorrelated with the regressors. Applies a partial demeaning transformation using the estimated variance components, then runs GLS.
- Parameters:
- Return type:
- Returns:
Dictionary with
coefficients,std_errors,t_stats,p_values,r_squared,theta(partial-demeaning parameter),residuals, andnobs.
- hausman_test(fe_results, re_results)[source]¶
Hausman specification test for fixed vs. random effects.
Under the null hypothesis, both FE and RE are consistent, but RE is efficient. Rejection favours fixed effects.
- between_effects(y, X, entity_col)[source]¶
Between estimator (regression on group means).
Collapses the panel to entity-level means and runs OLS. Exploits only the cross-sectional (between) variation and is inconsistent in the presence of entity-level omitted variables.
- Parameters:
- Return type:
- Returns:
Dictionary with
coefficients,std_errors,t_stats,p_values,r_squared,residuals, andn_entities.
- first_difference(y, X, entity_col, time_col)[source]¶
First-difference estimator.
Differences all variables within each entity, eliminating the time-invariant entity effect. Under strict exogeneity this is consistent; it is often preferred over FE when the errors are a random walk.
- Parameters:
- Return type:
- Returns:
Dictionary with
coefficients,std_errors,t_stats,p_values,r_squared,residuals, andnobs.
- var_model(data, max_lags=None, ic='aic')[source]¶
Fit a Vector Autoregression (VAR) model.
Selects lag order by information criterion and estimates the reduced-form VAR by equation-by-equation OLS.
- Parameters:
data (
DataFrame|ndarray) – Multivariate time series (T, k). Columns are treated as endogenous variables.max_lags (
int|None, default:None) – Maximum lag order to consider. Defaults toint(12 * (T / 100) ** (1/4)).ic (
str, default:'aic') – Information criterion for lag selection –"aic","bic","hqic", or"fpe".
- Return type:
- Returns:
Dictionary with
coefficients(k x k*p + 1 matrix where the last column is the intercept),lag_order,residuals(T-p, k),sigma_u(innovation covariance),aic,bic,fittedvalues, and aforecastcallable(steps) -> np.ndarray.
- vecm_model(data, k_ar_diff=1, det_order=0)[source]¶
Fit a Vector Error Correction Model (VECM) for cointegrated systems.
Estimates the Johansen cointegrating rank and then fits the VECM of the form: Delta y_t = alpha * beta’ * y_{t-1} + Gamma * Delta y_{t-1} + …
- Parameters:
- Return type:
- Returns:
Dictionary with
alpha(adjustment coefficients),beta(cointegrating vectors),gamma(short-run coefficients),det_coef(deterministic coefficients),coint_rank,residuals, andsigma_u.
- granger_causality(data, max_lag=10)[source]¶
Pairwise Granger causality tests for all variable pairs.
For each ordered pair (X, Y), tests whether lagged values of X help predict Y beyond Y’s own lags. Uses a VAR framework.
- impulse_response(var_coefficients, n_periods=20, shock_var=0)[source]¶
Compute impulse response functions from VAR coefficient matrices.
Applies a one-unit shock to shock_var at time 0 and traces out the dynamic response of all variables over n_periods.
- Parameters:
var_coefficients (
ndarray) – VAR coefficient matrix of shape (k, k*p) or (k, k*p + 1) where the last column is the intercept. The k-by-k blocks are [A1 | A2 | … | Ap].n_periods (
int, default:20) – Number of periods for the IRF (default 20).shock_var (
int, default:0) – Index of the variable receiving the unit shock.
- Return type:
- Returns:
Array of shape (n_periods + 1, k) where row h is the response of all k variables at horizon h.
- variance_decomposition(var_coefficients, n_periods=20)[source]¶
Forecast error variance decomposition from VAR coefficients.
Decomposes the forecast error variance of each variable into contributions from each structural shock (Cholesky identification).
- Parameters:
- Return type:
- Returns:
Array of shape (n_periods + 1, k, k) where entry [h, i, j] is the fraction of the h-step forecast error variance of variable i attributable to shocks in variable j.
- structural_break_test(y, X=None, method='chow', break_point=None)[source]¶
Test for structural breaks in a regression relationship.
- Parameters:
X (
ndarray|DataFrame|None, default:None) – Design matrix (T, k). IfNone, a constant-only model is used (testing a break in the mean).method (
str, default:'chow') –"chow"for a known break point, or"sup_f"for the supremum-F test (Andrews, 1993) which searches over candidate break points.break_point (
int|None, default:None) – Observation index of the hypothesised break (required formethod="chow"). For"sup_f"this is ignored.
- Return type:
- Returns:
Dictionary with
f_statistic,p_value,break_point, andis_break(at 5 % level).- Raises:
ValueError – If
method="chow"and break_point isNone.
- garch(returns, p=1, q=1, dist='normal')[source]¶
Fit a GARCH(p,q) model.
Delegates to
wraquant.vol.models.garch_fit().- Parameters:
- Return type:
- Returns:
Dictionary with
params,conditional_volatility,standardized_residuals,forecast(one-step-ahead volatility),aic,bic, andloglikelihood.
- garch_numpy_fallback(returns)[source]¶
Fit a GARCH(1,1) model using pure numpy/scipy.
This is a backward-compatible public wrapper around the private
_garch_numpy_fallback()helper.
- egarch(returns, p=1, q=1)[source]¶
Fit an EGARCH(p,q) model (exponential GARCH).
Delegates to
wraquant.vol.models.egarch_fit().
- gjr_garch(returns, p=1, q=1)[source]¶
Fit a GJR-GARCH(p,q) model (threshold GARCH).
Delegates to
wraquant.vol.models.gjr_garch_fit().
- dcc_garch(returns_df, p=1, q=1)[source]¶
Fit a Dynamic Conditional Correlation (DCC) GARCH model.
Delegates to
wraquant.vol.models.dcc_fit().- Parameters:
- Return type:
- Returns:
Dictionary with
univariate_params(per-asset GARCH parameters),conditional_correlations(T, k, k),conditional_covariances(T, k, k), andstandardized_residuals(T, k).
- arch_test(residuals, nlags=5)[source]¶
Engle’s ARCH-LM test for conditional heteroskedasticity.
Regresses squared residuals on their own lags. A significant test statistic indicates the presence of ARCH effects, justifying the use of GARCH-type models.
- Parameters:
- Return type:
- Returns:
Dictionary with
statistic(LM statistic),p_value,f_statistic,f_p_value, andis_arch(True at 5 % level).
Panel Data¶
Panel data econometrics.
Provides pooled OLS, fixed effects (within estimator), random effects (GLS), between effects, first-difference estimator, and the Hausman specification test. These are the workhorses of empirical asset pricing with panel data (e.g. Fama-MacBeth regressions, firm-level regressions with firm and time fixed effects).
- pooled_ols(y, X)[source]¶
Pooled OLS regression ignoring panel structure.
Treats all observations as independent cross-sectional data. This is generally inconsistent when unobserved heterogeneity is present but serves as a baseline for comparison with panel estimators.
- Parameters:
- Return type:
- Returns:
Dictionary with
coefficients,std_errors,t_stats,p_values,r_squared,adj_r_squared,residuals, andnobs.
- fixed_effects(y, X, entity_col, time_col=None)[source]¶
Entity fixed effects regression (within estimator).
Demeans all variables by entity (and optionally by time period) before running OLS, which is algebraically equivalent to including entity dummies. This removes time-invariant unobserved heterogeneity.
- Parameters:
y (
Series) – Dependent variable. Must share the same index as X.X (
DataFrame) – Regressor DataFrame. Must contain entity_col (and optionally time_col). All other columns are treated as regressors.entity_col (
str) – Column name identifying the cross-sectional unit.time_col (
str|None, default:None) – Optional column name identifying the time period for two-way fixed effects.
- Return type:
- Returns:
Dictionary with
coefficients,std_errors,t_stats,p_values,r_squared(within),entity_effects,residuals,nobs, andn_entities.
- random_effects(y, X, entity_col)[source]¶
Random effects (GLS) panel regression.
Assumes that the unobserved entity effect is uncorrelated with the regressors. Applies a partial demeaning transformation using the estimated variance components, then runs GLS.
- Parameters:
- Return type:
- Returns:
Dictionary with
coefficients,std_errors,t_stats,p_values,r_squared,theta(partial-demeaning parameter),residuals, andnobs.
- hausman_test(fe_results, re_results)[source]¶
Hausman specification test for fixed vs. random effects.
Under the null hypothesis, both FE and RE are consistent, but RE is efficient. Rejection favours fixed effects.
- between_effects(y, X, entity_col)[source]¶
Between estimator (regression on group means).
Collapses the panel to entity-level means and runs OLS. Exploits only the cross-sectional (between) variation and is inconsistent in the presence of entity-level omitted variables.
- Parameters:
- Return type:
- Returns:
Dictionary with
coefficients,std_errors,t_stats,p_values,r_squared,residuals, andn_entities.
- first_difference(y, X, entity_col, time_col)[source]¶
First-difference estimator.
Differences all variables within each entity, eliminating the time-invariant entity effect. Under strict exogeneity this is consistent; it is often preferred over FE when the errors are a random walk.
- Parameters:
- Return type:
- Returns:
Dictionary with
coefficients,std_errors,t_stats,p_values,r_squared,residuals, andnobs.
Cross Section¶
Cross-sectional econometric methods.
Provides robust OLS, quantile regression, instrumental variables (2SLS), GMM estimation, and the Sargan overidentification test – core tools for empirical asset pricing and cross-sectional return analysis.
- robust_ols(y, X, cov_type='HC1')[source]¶
OLS regression with heteroskedasticity-robust standard errors.
Implements the HC0 – HC3 covariance estimators of White (1980) and MacKinnon and White (1985). Point estimates are identical to plain OLS; only the standard errors, t-statistics, and p-values differ.
- Parameters:
- Return type:
- Returns:
Dictionary with
coefficients,std_errors,t_stats,p_values,r_squared,adj_r_squared,residuals,cov_type, andnobs.
- quantile_regression(y, X, quantile=0.5)[source]¶
Quantile regression via statsmodels.
Estimates the conditional quantile function, generalising OLS which estimates the conditional mean. Useful for understanding heterogeneous effects across the return distribution.
- Parameters:
- Return type:
- Returns:
Dictionary with
coefficients,std_errors,t_stats,p_values,quantile,pseudo_r_squared, andnobs.
- two_stage_least_squares(y, X, instruments, endog_vars)[source]¶
Two-stage least squares (2SLS) instrumental variables estimator.
Addresses endogeneity by instrumenting the endogenous regressors in X with exogenous instruments. The first stage regresses each endogenous variable on the instruments, and the second stage regresses y on the fitted values plus the exogenous regressors.
- Parameters:
X (
ndarray|DataFrame) – Full design matrix (n, k) containing both exogenous and endogenous regressors.instruments (
ndarray|DataFrame) – Matrix of excluded instruments (n, m) with m >= number of endogenous variables (order condition).endog_vars (
list[int] |ndarray) – Column indices (0-based) within X that are endogenous.
- Return type:
- Returns:
Dictionary with
coefficients,std_errors,t_stats,p_values,r_squared,residuals,first_stage_f, andnobs.- Raises:
ValueError – If the order condition is violated (fewer instruments than endogenous variables).
- gmm_estimation(moment_conditions, params_init, W=None, *, data=None, max_iter=2)[source]¶
Generalized Method of Moments (GMM) estimation.
Minimises the quadratic form g(theta)’ W g(theta) where g(theta) is the sample average of moment conditions. Supports iterated GMM (two-step by default).
- Parameters:
moment_conditions (
Callable[[ndarray,ndarray],ndarray]) – Callable(params, data) -> (n, q)returning the n-by-q matrix of moment conditions evaluated at params. Each row corresponds to an observation, each column to a moment condition.params_init (
ndarray) – Initial parameter vector (p,).W (
ndarray|None, default:None) – Weighting matrix (q, q). Defaults to the identity matrix (one-step GMM). After the first step the optimal weighting matrix is computed from the residual moment conditions.data (
ndarray|None, default:None) – Data array passed as the second argument to moment_conditions. IfNone, a zero-length array is used.max_iter (
int, default:2) – Maximum number of GMM iterations (default 2 = two-step GMM).
- Return type:
- Returns:
Dictionary with
params,objective,moment_conditions_mean,W,nobs, andn_moments.
- sargan_test(residuals, instruments)[source]¶
Sargan-Hansen J-test for overidentifying restrictions.
Tests whether excluded instruments are validly uncorrelated with the structural error. Only applicable when the model is overidentified (more instruments than endogenous variables).
- Parameters:
- Return type:
- Returns:
Dictionary with
statistic,p_value,df, andis_valid(instruments are valid at 5 % level if not rejected).
Time Series Econometrics¶
Time series econometrics.
Provides Vector Autoregression (VAR), Vector Error Correction Models (VECM), Granger causality tests, impulse response functions, forecast error variance decomposition, and structural break tests. These are the core multivariate time series tools used in macrofinance and empirical asset pricing (Campbell-Lo-MacKinlay ch. 11; Hamilton ch. 11-19; Lutkepohl, 2005).
- var_model(data, max_lags=None, ic='aic')[source]¶
Fit a Vector Autoregression (VAR) model.
Selects lag order by information criterion and estimates the reduced-form VAR by equation-by-equation OLS.
- Parameters:
data (
DataFrame|ndarray) – Multivariate time series (T, k). Columns are treated as endogenous variables.max_lags (
int|None, default:None) – Maximum lag order to consider. Defaults toint(12 * (T / 100) ** (1/4)).ic (
str, default:'aic') – Information criterion for lag selection –"aic","bic","hqic", or"fpe".
- Return type:
- Returns:
Dictionary with
coefficients(k x k*p + 1 matrix where the last column is the intercept),lag_order,residuals(T-p, k),sigma_u(innovation covariance),aic,bic,fittedvalues, and aforecastcallable(steps) -> np.ndarray.
- vecm_model(data, k_ar_diff=1, det_order=0)[source]¶
Fit a Vector Error Correction Model (VECM) for cointegrated systems.
Estimates the Johansen cointegrating rank and then fits the VECM of the form: Delta y_t = alpha * beta’ * y_{t-1} + Gamma * Delta y_{t-1} + …
- Parameters:
- Return type:
- Returns:
Dictionary with
alpha(adjustment coefficients),beta(cointegrating vectors),gamma(short-run coefficients),det_coef(deterministic coefficients),coint_rank,residuals, andsigma_u.
- granger_causality(data, max_lag=10)[source]¶
Pairwise Granger causality tests for all variable pairs.
For each ordered pair (X, Y), tests whether lagged values of X help predict Y beyond Y’s own lags. Uses a VAR framework.
- impulse_response(var_coefficients, n_periods=20, shock_var=0)[source]¶
Compute impulse response functions from VAR coefficient matrices.
Applies a one-unit shock to shock_var at time 0 and traces out the dynamic response of all variables over n_periods.
- Parameters:
var_coefficients (
ndarray) – VAR coefficient matrix of shape (k, k*p) or (k, k*p + 1) where the last column is the intercept. The k-by-k blocks are [A1 | A2 | … | Ap].n_periods (
int, default:20) – Number of periods for the IRF (default 20).shock_var (
int, default:0) – Index of the variable receiving the unit shock.
- Return type:
- Returns:
Array of shape (n_periods + 1, k) where row h is the response of all k variables at horizon h.
- variance_decomposition(var_coefficients, n_periods=20)[source]¶
Forecast error variance decomposition from VAR coefficients.
Decomposes the forecast error variance of each variable into contributions from each structural shock (Cholesky identification).
- Parameters:
- Return type:
- Returns:
Array of shape (n_periods + 1, k, k) where entry [h, i, j] is the fraction of the h-step forecast error variance of variable i attributable to shocks in variable j.
- structural_break_test(y, X=None, method='chow', break_point=None)[source]¶
Test for structural breaks in a regression relationship.
- Parameters:
X (
ndarray|DataFrame|None, default:None) – Design matrix (T, k). IfNone, a constant-only model is used (testing a break in the mean).method (
str, default:'chow') –"chow"for a known break point, or"sup_f"for the supremum-F test (Andrews, 1993) which searches over candidate break points.break_point (
int|None, default:None) – Observation index of the hypothesised break (required formethod="chow"). For"sup_f"this is ignored.
- Return type:
- Returns:
Dictionary with
f_statistic,p_value,break_point, andis_break(at 5 % level).- Raises:
ValueError – If
method="chow"and break_point isNone.
Event Studies¶
Event study methodology.
Implements the classic event study framework widely used in empirical finance to measure the impact of corporate events (earnings announcements, M&A, etc.) on security prices. Follows the methodology of MacKinlay (1997) and Campbell, Lo, and MacKinlay (1997, ch. 4).
- event_study(returns, event_dates, estimation_window=(-250, -10), event_window=(-5, 5), market_returns=None)[source]¶
Classic event study with market-model abnormal returns.
For each event date, estimates a market model over the estimation_window and computes abnormal returns (AR) and cumulative abnormal returns (CAR) over the event_window.
If market_returns is
None, a constant-mean-return model is used instead of the market model.- Parameters:
returns (
DataFrame|Series) – Return series for the security (or DataFrame with one column per security). Must have a DatetimeIndex.event_dates (
list|DatetimeIndex) – List of event dates.estimation_window (
tuple[int,int], default:(-250, -10)) –(start, end)offsets in trading days relative to the event date for the estimation period. Default(-250, -10).event_window (
tuple[int,int], default:(-5, 5)) –(start, end)offsets for the event window. Default(-5, 5).market_returns (
Series|None, default:None) – Market return series (must overlap in time with returns).
- Returns:
abnormal_returns: DataFrame (n_events, event_window_len) of AR.car: Series of cumulative abnormal returns for each event.mean_car: Average CAR across events.t_stat: Cross-sectional t-statistic for mean CAR.p_value: Two-sided p-value.event_dates: The event dates used.n_events: Number of valid events.
- Return type:
- cumulative_abnormal_return(returns, expected_returns, event_window=None)[source]¶
Compute cumulative abnormal returns (CAR).
- Parameters:
- Return type:
- Returns:
Series of cumulative abnormal returns.
- buy_and_hold_abnormal_return(returns, benchmark_returns, event_window=None)[source]¶
Buy-and-hold abnormal return (BHAR).
BHAR = product(1 + R_i) - product(1 + R_benchmark)
Unlike CAR, BHAR accounts for compounding and is preferred for longer-horizon event studies (Barber and Lyon, 1997).
- Parameters:
- Return type:
- Returns:
BHAR as a float.
Diagnostics¶
Regression diagnostics for econometric models.
Provides tests for serial correlation, heteroskedasticity, normality, functional form misspecification, and multicollinearity detection.
- durbin_watson(residuals)[source]¶
Compute the Durbin-Watson statistic for serial correlation.
The statistic ranges from 0 to 4. A value near 2 indicates no first-order autocorrelation, values near 0 indicate positive autocorrelation, and values near 4 indicate negative autocorrelation.
Delegates to
wraquant.stats.tests.durbin_watsonfor the core computation.
- breusch_godfrey(residuals, X, nlags=4)[source]¶
Breusch-Godfrey LM test for serial correlation.
Regresses the residuals on the original regressors plus lagged residuals. A significant test statistic rejects the null of no serial correlation up to order nlags.
- Parameters:
- Return type:
- Returns:
Dictionary with
lm_statistic,lm_p_value,f_statistic,f_p_value, andis_autocorrelated(at 5 % level).
- breusch_pagan(residuals, X)[source]¶
Breusch-Pagan test for heteroskedasticity.
Tests whether the variance of the residuals depends on the regressors. The null hypothesis is homoskedasticity. Delegates to
wraquant.stats.tests.breusch_paganfor the core computation.
- white_test(residuals, X)[source]¶
White’s general test for heteroskedasticity.
Includes cross-product terms and squared regressors, making it more general than Breusch-Pagan. The null hypothesis is homoskedasticity. Delegates to
wraquant.stats.tests.white_testfor the core computation.
- jarque_bera(residuals)[source]¶
Jarque-Bera test for normality of residuals.
The null hypothesis is that the residuals are normally distributed. Delegates to
wraquant.stats.tests.test_normalityfor the core computation.
- ramsey_reset(y, X, power=3)[source]¶
Ramsey RESET test for functional form misspecification.
Adds powers of the fitted values to the regression and tests their joint significance. Rejection suggests the linear specification is inadequate.
- Parameters:
- Return type:
- Returns:
Dictionary with
f_statistic,p_value,df_num,df_denom, andis_misspecified(at 5 % level).
- vif(X)[source]¶
Compute Variance Inflation Factors for each regressor.
VIF > 10 is a common rule-of-thumb threshold indicating problematic multicollinearity. The input should not include a constant column.
Volatility¶
Volatility econometrics.
Provides the ARCH-LM test and backward-compatible wrappers for GARCH-family
models. All GARCH estimation is delegated to wraquant.vol.models,
which is the canonical home for volatility modeling.
The pure-numpy GARCH(1,1) fallback is retained as a private helper
(_garch_numpy_fallback()) for environments without the arch
library.
- arch_test(residuals, nlags=5)[source]¶
Engle’s ARCH-LM test for conditional heteroskedasticity.
Regresses squared residuals on their own lags. A significant test statistic indicates the presence of ARCH effects, justifying the use of GARCH-type models.
- Parameters:
- Return type:
- Returns:
Dictionary with
statistic(LM statistic),p_value,f_statistic,f_p_value, andis_arch(True at 5 % level).
- garch(returns, p=1, q=1, dist='normal')[source]¶
Fit a GARCH(p,q) model.
Delegates to
wraquant.vol.models.garch_fit().- Parameters:
- Return type:
- Returns:
Dictionary with
params,conditional_volatility,standardized_residuals,forecast(one-step-ahead volatility),aic,bic, andloglikelihood.
- garch_numpy_fallback(returns)[source]¶
Fit a GARCH(1,1) model using pure numpy/scipy.
This is a backward-compatible public wrapper around the private
_garch_numpy_fallback()helper.
- egarch(returns, p=1, q=1)[source]¶
Fit an EGARCH(p,q) model (exponential GARCH).
Delegates to
wraquant.vol.models.egarch_fit().
- gjr_garch(returns, p=1, q=1)[source]¶
Fit a GJR-GARCH(p,q) model (threshold GARCH).
Delegates to
wraquant.vol.models.gjr_garch_fit().
- dcc_garch(returns_df, p=1, q=1)[source]¶
Fit a Dynamic Conditional Correlation (DCC) GARCH model.
Delegates to
wraquant.vol.models.dcc_fit().- Parameters:
- Return type:
- Returns:
Dictionary with
univariate_params(per-asset GARCH parameters),conditional_correlations(T, k, k),conditional_covariances(T, k, k), andstandardized_residuals(T, k).