Causal Inference (`wraquant.causal`)¶

Causal inference methods for quantitative finance: difference-in-differences, synthetic control, inverse probability weighting, and treatment effect estimation. Use these to measure the causal impact of events, policy changes, or interventions on financial outcomes.

Quick Example¶

from wraquant.causal import treatment

# Difference-in-differences: impact of an event on asset returns
did = treatment.difference_in_differences(
    treated=treated_returns,
    control=control_returns,
    treatment_date="2023-06-01",
)
print(f"Treatment effect: {did['effect']:.4f}")
print(f"p-value: {did['p_value']:.4f}")

# Synthetic control: construct a counterfactual from control assets
sc = treatment.synthetic_control(
    treated=treated_returns,
    donors=donor_returns_df,
    treatment_date="2023-06-01",
)
print(f"Synthetic treatment effect: {sc['effect']:.4f}")

See also

Econometrics (wraquant.econometrics) – Event studies and panel data methods
Statistics (wraquant.stats) – Statistical testing foundations

API Reference¶

Causal inference methods for quantitative finance.

Provides tools for estimating causal effects in financial settings – where randomized experiments are rare and observational data dominates. Covers the major causal identification strategies used in empirical finance and economics: propensity score methods for selection-on- observables designs, difference-in-differences for policy evaluation, synthetic control for single-treated-unit studies, regression discontinuity for threshold-based effects, and instrumental variables for endogeneity correction.

Key components:

Propensity score methods – propensity_score (logistic model), ipw_ate (Inverse Probability Weighting for Average Treatment Effect), matching_ate (nearest-neighbor propensity matching), doubly_robust_ate (combines outcome model with IPW for double protection against misspecification).
Difference-in-differences – diff_in_diff for estimating treatment effects from panel data with parallel trends.
Synthetic control – synthetic_control (Abadie et al. method for constructing a counterfactual from donor units), synthetic_control_weights (retrieve the donor weights).
Regression discontinuity – regression_discontinuity (sharp RDD), regression_discontinuity_robust (bias-corrected with robust confidence intervals).
Granger causality – granger_causality for testing predictive causality in time series.
Instrumental variables – instrumental_variable (2SLS for endogenous regressors).
Event studies – event_study for estimating dynamic treatment effects around an event.
Advanced methods – causal_forest (heterogeneous treatment effects via random forests), mediation_analysis (decompose direct and indirect effects), bounds_analysis (partial identification under weaker assumptions).

Example

>>> from wraquant.causal import diff_in_diff, synthetic_control
>>> att = diff_in_diff(panel, y="returns", treat="treated", post="post_event")
>>> sc = synthetic_control(treated_series, donor_matrix, pre_periods=60)

Use wraquant.causal when you need to establish causal relationships rather than mere correlations – for example, measuring the effect of a policy change on asset prices or estimating the impact of an index inclusion event. For standard regression without causal identification, see wraquant.stats.regression or wraquant.econometrics.

propensity_score(treatment, covariates)[source]¶

Estimate propensity scores using logistic regression.

The propensity score e(X) = P(treatment=1 | covariates=X) is the probability that a unit receives treatment given its observed characteristics. It is the foundation of all observational causal inference: by conditioning on (or weighting by) the propensity score, you can remove confounding bias, making the treated and control groups comparable.

This function fits a logistic regression of treatment assignment on covariates. Scores are clipped to [0.01, 0.99] to prevent extreme inverse-probability weights that would blow up downstream estimators.

Interpretation:

A propensity score of 0.5 means the unit was equally likely to be treated or not – ideal for causal inference.
Scores near 0 or 1 indicate units that were almost deterministically treated/untreated. These are problematic: the overlap assumption fails, and IPW weights become extreme.
Check the overlap: plot histograms of propensity scores separately for treated and control groups. If they don’t overlap, causal inference is unreliable in the non-overlap region.

When to use:

As input to ipw_ate or doubly_robust_ate.
For propensity score matching (match treated to controls with similar scores).
To diagnose selection bias: if propensity scores are very different between groups, selection is strong.

Red flags:

Most scores near 0 or 1: poor overlap, IPW will be unstable.
Treatment is nearly perfectly predicted: positivity violation.
Scores are all similar (~0.5): treatment is nearly random, which is actually ideal for causal inference.

Parameters:

treatment (ndarray) – Binary treatment indicator (1-D array of 0s and 1s).
covariates (ndarray) – Covariate matrix of shape (n_samples, n_features). An intercept column is added automatically; do not include one.

Return type:

ndarray

Returns:

Estimated propensity scores (probabilities) of shape (n_samples,), each in [0.01, 0.99].

Example

>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> X = rng.normal(size=(200, 3))
>>> T = (X[:, 0] + rng.normal(size=200) > 0).astype(float)
>>> ps = propensity_score(T, X)
>>> # Check overlap: both groups should have scores in [0.2, 0.8]
>>> print(f"Treated scores: [{ps[T==1].min():.2f}, {ps[T==1].max():.2f}]")
>>> print(f"Control scores: [{ps[T==0].min():.2f}, {ps[T==0].max():.2f}]")

See also

ipw_ate: Use propensity scores to estimate the ATE. doubly_robust_ate: Combines propensity scores with outcome

regression for robustness.

ipw_ate(outcome, treatment, propensity_scores)[source]¶

Estimate the Average Treatment Effect using Inverse Probability Weighting.

The ATE (Average Treatment Effect) answers: “On average, how much does the treatment change the outcome?” IPW estimates this by reweighting observations by the inverse of their propensity score, creating a pseudo-population where treatment assignment is independent of confounders.

The Horvitz-Thompson estimator is:

\[\widehat{\text{ATE}} = \frac{1}{n}\sum_{i=1}^{n} \frac{T_i\,Y_i}{e(X_i)} - \frac{1}{n}\sum_{i=1}^{n} \frac{(1 - T_i)\,Y_i}{1 - e(X_i)}\]

Interpretation:

ate > 0: Treatment increases the outcome on average.
ate < 0: Treatment decreases the outcome on average.
The 95% CI is asymptotically valid under correct propensity score specification and the overlap assumption.
If CI includes 0: cannot conclude treatment has an effect.

When to use:

You have a well-estimated propensity score model.
There is good overlap (propensity scores are not extreme).
You are willing to assume no unobserved confounders.

When NOT to use:

Propensity scores near 0 or 1: weights explode, variance is huge. Use doubly_robust_ate instead.
Strong model misspecification concerns: use doubly_robust_ate (consistent if either model is right).
Very few treated or control units: matching may be better.

Parameters:

outcome (ndarray) – Observed outcomes (1-D array).
treatment (ndarray) – Binary treatment indicator (0s and 1s).
propensity_scores (ndarray) – Estimated propensity scores from propensity_score() (1-D array).

Return type:

ATEResult

Returns:

ATEResult with ATE estimate, standard error, 95% confidence interval, and sample sizes.

Example

>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> n = 500
>>> X = rng.normal(size=(n, 2))
>>> T = (X[:, 0] > 0).astype(float)
>>> Y = 2.0 * T + X[:, 0] + rng.normal(size=n)
>>> ps = propensity_score(T, X)
>>> result = ipw_ate(Y, T, ps)
>>> print(f"ATE: {result.ate:.2f} [{result.ci_lower:.2f}, {result.ci_upper:.2f}]")

See also

doubly_robust_ate: More robust alternative (recommended default). matching_ate: Nonparametric matching estimator.

matching_ate(outcome, treatment, covariates, n_neighbors=1)[source]¶

Estimate the ATE using nearest-neighbor matching on covariates.

For each treated unit, finds the closest control unit(s) in covariate space and uses the matched pair difference as the individual treatment effect estimate (and vice versa for controls). The ATE is the average of these pairwise differences.

Matching is the most intuitive causal inference method: compare each treated unit to the most similar untreated unit. It is nonparametric (no model assumptions) but suffers from the curse of dimensionality – with many covariates, “nearest” neighbors can be far away.

Interpretation:

ate is the average difference in outcomes between treated units and their matched controls, averaged over all units.
Good matching means the matched pairs are genuinely similar. Check the covariate balance after matching.
n_neighbors = 1: lowest bias, highest variance (each unit matched to exactly one partner).
n_neighbors = 3-5: reduces variance at the cost of some bias (averaging over less-similar matches).

When to use:

Low-dimensional covariates (2-5 features).
When you want a transparent, model-free estimator.
When propensity score modelling is difficult.

Limitations:

Curse of dimensionality: in high dimensions, matches become poor. Use propensity score matching instead.
Biased with finite samples (Abadie & Imbens, 2006).
No extrapolation: cannot estimate effects outside the overlap region.

Parameters:

outcome (ndarray) – Observed outcomes (1-D array).
treatment (ndarray) – Binary treatment indicator (0s and 1s).
covariates (ndarray) – Covariate matrix of shape (n_samples, n_features).
n_neighbors (int, default: 1) – Number of nearest neighbors to match with (default 1). More neighbors reduce variance but may increase bias.

Return type:

ATEResult

Returns:

ATEResult with ATE estimate, standard error, and 95% CI.

Example

>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> X = rng.normal(size=(300, 2))
>>> T = (X[:, 0] > 0).astype(float)
>>> Y = 1.5 * T + X[:, 0] + rng.normal(size=300)
>>> result = matching_ate(Y, T, X)
>>> print(f"ATE: {result.ate:.2f} (se={result.se:.2f})")

See also

ipw_ate: Weighting-based estimator (no matching). doubly_robust_ate: Recommended default when uncertain about

model specification.

doubly_robust_ate(outcome, treatment, covariates)[source]¶

Estimate the ATE using the doubly robust (augmented IPW) estimator.

The doubly robust estimator combines inverse probability weighting with outcome regression. It is consistent if either the propensity score model or the outcome regression model is correctly specified, providing a valuable insurance against model misspecification.

This is the recommended default when you are unsure about model specification.

Internally, the function:

Estimates propensity scores via logistic regression.
Fits separate OLS outcome models for treated and control groups.
Combines both in the augmented IPW formula.

Parameters:

outcome (ndarray) – Observed outcomes (1-D array).
treatment (ndarray) – Binary treatment indicator (0s and 1s).
covariates (ndarray) – Covariate matrix of shape (n_samples, n_features).

Returns:

ATE estimate with standard error and 95% confidence: interval.

Return type:

ATEResult

Example

>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> X = rng.normal(size=(500, 3))
>>> T = (X[:, 0] > 0).astype(float)
>>> Y = 2.0 * T + X[:, 0] + rng.normal(size=500)
>>> result = doubly_robust_ate(Y, T, X)
>>> 0.5 < result.ate < 3.5
True

See also

ipw_ate: Pure IPW estimator. matching_ate: Nonparametric matching estimator. propensity_score: Logistic regression propensity scores.

References

Robins, J.M., Rotnitzky, A. & Zhao, L.P. (1994). Estimation of Regression Coefficients When Some Regressors Are Not Always Observed. JASA 89(427).

regression_discontinuity(outcome, running_var, cutoff=0.0, bandwidth=None)[source]¶

Estimate treatment effect using a sharp regression discontinuity design.

RDD exploits a sharp cutoff in a running variable to identify a local causal effect. The key insight: units just above and just below the cutoff are essentially identical except for treatment status, so any discontinuity in outcomes at the cutoff is attributable to the treatment.

Financial examples:

Market-cap threshold for S&P 500 inclusion: does index inclusion cause a price premium?
Credit score cutoff for loan approval: does access to credit affect firm investment?
Regulatory thresholds: do firms just above a reporting threshold behave differently?

Interpretation:

ate is the LOCAL average treatment effect at the cutoff. It tells you the effect for units right at the boundary, not the average effect for all units.
The validity depends on the assumption that units cannot precisely manipulate their position relative to the cutoff. If they can (e.g., firms just barely meet a threshold by manipulating data), the estimate is biased. Use the McCrary test (in regression_discontinuity_robust) to check for manipulation.
bandwidth matters: too wide includes units far from the cutoff (more bias); too narrow uses few observations (more variance).

Red flags:

Very different n_left and n_right: asymmetric data.
Bandwidth is very small: few observations, imprecise.
Density of running variable has a spike at the cutoff: possible manipulation (run McCrary test).

Parameters:

outcome (ndarray) – Observed outcomes (1-D array).
running_var (ndarray) – Running variable that determines treatment (1-D array). Units with running_var >= cutoff are treated.
cutoff (float, default: 0.0) – Cutoff value (default 0.0).
bandwidth (float | None, default: None) – Bandwidth for local linear regression. If None, uses Silverman’s rule. Narrower = less bias, more variance.

Return type:

RDResult

Returns:

RDResult with estimated local ATE at the cutoff, standard error, confidence interval, and bandwidth used.

Raises:

ValueError – If fewer than 2 observations on either side within bandwidth.

Example

>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> n = 500
>>> X = rng.uniform(-1, 1, n)
>>> Y = 0.5 * (X >= 0).astype(float) + X + rng.normal(0, 0.2, n)
>>> result = regression_discontinuity(Y, X, cutoff=0.0)
>>> print(f"RD estimate: {result.ate:.3f} (se={result.se:.3f})")

See also

regression_discontinuity_robust: With IK bandwidth and McCrary test. diff_in_diff: Before/after comparison with control group.

synthetic_control(treated_outcomes, donor_outcomes, pre_period)[source]¶

Estimate treatment effect using the synthetic control method.

The synthetic control method constructs a data-driven counterfactual for a single treated unit by finding a weighted combination of untreated donor units that closely reproduces the treated unit’s pre-treatment trajectory. The treatment effect is the post-treatment gap between the observed and synthetic outcomes.

Use synthetic control for single-unit case studies (e.g., the effect of a policy change on one country, or a management change at one firm) when a natural control group is unavailable.

Parameters:

treated_outcomes (ndarray) – Outcomes for the treated unit over all time periods (1-D array of length T).
donor_outcomes (ndarray) – Outcomes for donor (control) units, shape (T, n_donors).
pre_period (int) – Number of pre-treatment time periods. The first pre_period rows are used for fitting; the remainder is the post-treatment evaluation window.

Returns:

Estimated treatment effect (average: post-period gap), donor weights, synthetic outcomes, and pre-period RMSE as a fit quality diagnostic.

Return type:

SyntheticControlResult

Raises:

ValueError – If pre_period is out of range.

Example

>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> T, n_donors = 20, 5
>>> donors = rng.normal(size=(T, n_donors)).cumsum(axis=0)
>>> treated = donors @ np.array([0.3, 0.5, 0.2, 0, 0])
>>> treated[15:] += 2.0  # treatment effect of 2.0
>>> result = synthetic_control(treated, donors, pre_period=15)
>>> result.ate > 0
True

See also

diff_in_diff: Two-group before/after comparison. regression_discontinuity: Cutoff-based causal identification.

References

Abadie, A., Diamond, A. & Hainmueller, J. (2010). Synthetic Control Methods for Comparative Case Studies. JASA 105(490).

diff_in_diff(outcome, treatment, post, entity=None)[source]¶

Estimate treatment effect using difference-in-differences (DID).

DID is the workhorse of causal inference in finance and economics. It compares the before-after change in the treatment group to the before-after change in the control group, netting out common time trends:

\[\hat{\delta} = (\bar{Y}_{T,\text{post}} - \bar{Y}_{T,\text{pre}}) - (\bar{Y}_{C,\text{post}} - \bar{Y}_{C,\text{pre}})\]

The key identifying assumption is parallel trends: absent the treatment, both groups would have followed the same trajectory.

Interpretation:

ate (delta): The causal effect of the treatment on the treated group, net of any common time trend.
Positive delta = treatment increased the outcome.
The group means (pre_treatment_mean, post_treatment_mean, pre_control_mean, post_control_mean) can be used to construct a “parallel trends” plot: plot the treated and control group means over time. If they are parallel pre-treatment, the assumption is plausible.

How to check parallel trends:

Plot both group means in the pre-treatment period. They should move in parallel (not necessarily at the same level – DID allows for level differences).
Run placebo DID tests at fake treatment dates in the pre-period. The estimated “effects” should be near zero.

Financial examples:

Effect of a new regulation on trading costs (treated = affected firms, control = unaffected firms).
Impact of an IPO on a firm’s operating performance (treated = IPO firms, control = matched private firms).
Did the COVID-19 lockdown affect commercial real estate values differently than residential?

Parameters:

outcome (ndarray) – Observed outcomes (1-D array).
treatment (ndarray) – Binary group indicator: 1 = treatment group, 0 = control group.
post (ndarray) – Binary time indicator: 1 = post-treatment, 0 = pre-treatment.
entity (ndarray | None, default: None) – Entity identifiers for panel data. If provided, entity fixed effects are included.

Return type:

DIDResult

Returns:

DIDResult with the DID estimate, standard error, 95% CI, and group means for diagnostics.

Example

>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> n = 400
>>> treat = np.repeat([0, 1], n // 2)
>>> post_period = np.tile([0, 1], n // 2)
>>> Y = 1.0 + 0.5 * treat + 0.3 * post_period + 2.0 * treat * post_period
>>> Y += rng.normal(0, 0.5, n)
>>> result = diff_in_diff(Y, treat, post_period)
>>> print(f"DID: {result.ate:.2f} [{result.ci_lower:.2f}, {result.ci_upper:.2f}]")

See also

synthetic_control: For single-unit case studies. event_study: For measuring effects around specific dates.

granger_causality(x, y, max_lag=10, significance=0.05)[source]¶

Test whether time series x Granger-causes time series y.

Granger causality tests whether past values of x contain information that helps predict y beyond what past values of y alone provide. The null hypothesis is that x does not Granger-cause y.

IMPORTANT: Granger “causality” is really predictive causality, NOT true causality. It means “x helps forecast y” – this could be because x causes y, or because x responds faster to a common cause. The name is a historical misnomer.

The test fits two VAR models for each candidate lag order p:

\[ \begin{align}\begin{aligned}\text{Restricted:} \quad y_t = \alpha + \sum_{i=1}^{p} \beta_i y_{t-i} + \varepsilon_t\\\text{Unrestricted:} \quad y_t = \alpha + \sum_{i=1}^{p} \beta_i y_{t-i} + \sum_{i=1}^{p} \gamma_i x_{t-i} + \varepsilon_t\end{aligned}\end{align} \]

The F-statistic tests \(H_0: \gamma_1 = \cdots = \gamma_p = 0\).

Interpretation:

reject = True: Past x helps predict y beyond y’s own history. This is evidence of a predictive relationship.
reject = False: No evidence that x helps predict y.
optimal_lag: The lag order selected by BIC. Economically, this tells you “how far back does x’s influence on y extend?”
Check both directions: if x Granger-causes y AND y Granger-causes x, there may be a common driver (not a directional causal effect).
Both series should be stationary. If they are I(1), difference first. If they are cointegrated, use a VECM instead.

Financial use cases:

Does the VIX Granger-cause S&P 500 returns?
Does order flow predict price changes?
Does the yield curve lead GDP growth?
Does sentiment data help forecast volatility?

Parameters:

x (ndarray) – Potential cause time series (1D array, length T).
y (ndarray) – Potential effect time series (1D array, length T).
max_lag (int, default: 10) – Maximum lag order to test. The optimal lag is selected by BIC. Default is 10.
significance (float, default: 0.05) – Significance level for rejection. Default is 0.05.

Return type:

GrangerResult

Returns:

GrangerResult with F-statistic, p-value, optimal lag, direction, and per-lag results.

Raises:

ValueError – If the time series are too short for the requested max_lag.

Example

>>> import numpy as np
>>> rng = np.random.default_rng(42)
>>> x = rng.normal(size=200)
>>> y = np.zeros(200)
>>> for t in range(1, 200):
...     y[t] = 0.5 * y[t-1] + 0.3 * x[t-1] + rng.normal(0, 0.5)
>>> result = granger_causality(x, y, max_lag=5)
>>> result.reject  # should be True — x Granger-causes y
True

References

Granger, C. W. J. (1969). “Investigating Causal Relations by Econometric Models and Cross-spectral Methods.”
Toda, H. Y. & Yamamoto, T. (1995). “Statistical inference in vector autoregressions with possibly integrated processes.”

Notes

Uses statsmodels.tsa.stattools.grangercausalitytests internally. The function tests \(x \to y\) (whether x helps predict y). To test the reverse direction, swap the arguments.

instrumental_variable(outcome, endogenous, instruments, exogenous=None)[source]¶

Two-stage least squares (2SLS) instrumental variable estimation.

IV estimation addresses endogeneity — the situation where an explanatory variable is correlated with the error term (e.g., due to omitted variables, simultaneity, or measurement error). An instrument Z must satisfy:

Relevance: Z is correlated with the endogenous regressor X.
Exclusion: Z affects the outcome Y only through X.

\[ \begin{align}\begin{aligned}\text{First stage:} \quad X = Z \pi + W \delta + v\\\text{Second stage:} \quad Y = \hat{X} \beta + W \gamma + \varepsilon\end{aligned}\end{align} \]

Diagnostics provided:

First-stage F-statistic: Tests instrument relevance. F < 10 indicates weak instruments (Staiger & Stock, 1997).
Hausman test: Compares OLS and 2SLS coefficients. Rejection supports the endogeneity of the regressor and the need for IV.
Sargan overidentification test: When there are more instruments than endogenous regressors, tests the joint validity of instruments. Rejection suggests at least one instrument is invalid.

Financial use cases:

Estimating the effect of leverage on firm value using regulatory shocks as instruments.
Measuring the causal impact of analyst coverage on stock liquidity using index inclusion as an instrument.
Identifying the effect of order flow on prices using weather (affects trader mood) as an instrument.

Parameters:

outcome (ndarray) – Dependent variable Y (1D array, length n).
endogenous (ndarray) – Endogenous regressor X (1D array, length n).
instruments (ndarray) – Instrumental variables Z (n, k) matrix where k >= 1. More instruments than endogenous regressors allows the Sargan overidentification test.
exogenous (ndarray | None, default: None) – Exogenous control variables W (n, m). An intercept is always added. Default is None (intercept only).

Returns:

2SLS coefficient, standard error, confidence interval, and diagnostic tests.

Return type:

IVResult

Raises:

ValueError – If dimensions are incompatible or there are too few instruments.

Example

>>> import numpy as np
>>> rng = np.random.default_rng(42)
>>> n = 1000
>>> z = rng.normal(size=(n, 1))  # instrument
>>> u = rng.normal(size=n)       # unobserved confounder
>>> x = z.ravel() + u + rng.normal(0, 0.5, n)  # endogenous
>>> y = 2.0 * x + u + rng.normal(0, 0.5, n)    # outcome
>>> result = instrumental_variable(y, x, z)
>>> abs(result.coefficient - 2.0) < 0.5
True

References

Angrist, J. D. & Pischke, J.-S. (2009). “Mostly Harmless Econometrics: An Empiricist’s Companion.”
Staiger, D. & Stock, J. H. (1997). “Instrumental Variables Regression with Weak Instruments.”
Sargan, J. D. (1958). “The Estimation of Economic Relationships Using Instrumental Variables.”

event_study(returns, market_returns, event_indices, estimation_window=120, event_window_pre=5, event_window_post=5, gap=10)[source]¶

Conduct a market-model event study for one or more events.

The event study methodology estimates abnormal returns around an event by comparing realized returns to expected returns from a market model estimated during a pre-event estimation window.

Methodology:

Estimation window: Fit \(R_{i,t} = \alpha + \beta R_{m,t} + \varepsilon_t\) over a window ending gap days before the event.
Event window: Compute abnormal returns \(AR_t = R_{i,t} - (\hat{\alpha} + \hat{\beta} R_{m,t})\) for each day in [-event_window_pre, +event_window_post].
CAR: Cumulative Abnormal Return \(= \sum AR_t\).
Statistical test: \(t = \text{CAR} / (\hat{\sigma} \sqrt{L})\) where L is the length of the event window and \(\hat{\sigma}\) is the residual std from the estimation window.
Cross-sectional test (multiple events): Use the distribution of individual CARs across events.

Financial use cases:

Impact of earnings announcements on stock prices.
Effect of M&A announcements on acquirer/target returns.
Regulatory changes (e.g., short-selling bans).
Central bank policy announcements.

Parameters:

returns (ndarray) – Asset return series (1D array, length T).
market_returns (ndarray) – Market (benchmark) return series (1D array, length T).
event_indices (list[int] | ndarray) – Index positions in the return series where each event occurs. For a single event, pass [idx].
estimation_window (int, default: 120) – Number of days in the estimation window. Default is 120.
event_window_pre (int, default: 5) – Number of days before the event in the event window. Default 5.
event_window_post (int, default: 5) – Number of days after the event in the event window. Default 5.
gap (int, default: 10) – Gap between the estimation window and the event window to avoid contamination. Default is 10.

Returns:

CAR, abnormal returns, t-statistics, and diagnostics.

Return type:

EventStudyResult

Raises:

ValueError – If any event does not have enough data for the estimation or event windows.

Example

>>> import numpy as np
>>> rng = np.random.default_rng(42)
>>> market = rng.normal(0.0005, 0.01, 500)
>>> stock = 0.001 + 1.2 * market + rng.normal(0, 0.005, 500)
>>> stock[250:256] += 0.02  # inject positive event at day 250
>>> result = event_study(stock, market, [250], event_window_post=5)
>>> result.car > 0  # should detect the positive abnormal return
True

References

MacKinlay, A. C. (1997). “Event Studies in Economics and Finance.” Journal of Economic Literature, 35(1), 13-39.
Brown, S. J. & Warner, J. B. (1985). “Using Daily Stock Returns.” Journal of Financial Economics, 14(1), 3-31.

synthetic_control_weights(treated_outcomes, donor_outcomes, pre_period, run_placebo=False, donor_names=None)[source]¶

Enhanced synthetic control with donor selection and placebo inference.

Extends the basic synthetic control method with:

Pre-treatment fit quality: RMSPE (root mean squared prediction error) measures how well the synthetic unit matches the treated unit before treatment. Lower is better.
Placebo (permutation) tests: Applies the synthetic control method iteratively, treating each donor as if it were the treated unit. The RMSPE ratio (post/pre) for the true treated unit is compared to the distribution of placebo ratios. A p-value is the fraction of placebo ratios at least as large as the treated unit’s.
Gap plot data: Period-by-period gaps and cumulative effects for visualization.

\[\text{RMSPE ratio} = \frac{\text{RMSPE}_{\text{post}}}{\text{RMSPE}_{\text{pre}}}\]

Financial use cases:

Impact of a merger/acquisition on the target firm’s stock price relative to a synthetic peer.
Effect of a regulatory change on trading volume for a single exchange.
Impact of a central bank’s unconventional monetary policy on a country’s equity market.

Parameters:

treated_outcomes (ndarray) – Outcomes for the treated unit over all time periods (1D, length T).
donor_outcomes (ndarray) – Outcomes for donor units (T, n_donors).
pre_period (int) – Number of pre-treatment time periods.
run_placebo (bool, default: False) – If True, runs placebo tests for each donor to compute permutation-based p-values. Default is False (expensive for many donors).
donor_names (list[str] | None, default: None) – Optional names for donor units for interpretability.

Returns:

Weights, RMSPE, placebo p-value, gap data.

Return type:

SyntheticControlWeightsResult

Example

>>> import numpy as np
>>> rng = np.random.default_rng(42)
>>> T, J = 50, 10
>>> common = np.cumsum(rng.normal(0, 0.5, T))
>>> donors = np.column_stack([
...     common + rng.normal(0, 0.3, T) for _ in range(J)
... ])
>>> true_w = rng.dirichlet(np.ones(J))
>>> treated = donors @ true_w + rng.normal(0, 0.1, T)
>>> treated[30:] += 5.0  # treatment effect of 5
>>> result = synthetic_control_weights(treated, donors, 30,
...                                    run_placebo=True)
>>> result.ate  # should be close to 5.0
5.0...
>>> result.placebo_p_value < 0.2  # significant
True

References

Abadie, A., Diamond, A. & Hainmueller, J. (2010). “Synthetic Control Methods for Comparative Case Studies.”
Abadie, A. (2021). “Using Synthetic Controls: Feasibility, Data Requirements, and Methodological Aspects.”

causal_forest(outcome, treatment, covariates, n_estimators=200, min_samples_leaf=5, max_depth=None, honest=True, seed=42)[source]¶

Estimate heterogeneous treatment effects using a causal forest.

Implements the T-learner approach for CATE (Conditional Average Treatment Effect) estimation using random forests. Two separate forests are trained — one for the treated outcomes and one for the control outcomes — and the CATE is estimated as the difference in predictions.

If honest=True, the data is split: one half for building the tree structure, the other half for estimating leaf predictions. This reduces overfitting and provides valid confidence intervals.

\[\hat{\tau}(x) = \hat{\mu}_1(x) - \hat{\mu}_0(x)\]

where \(\hat{\mu}_1\) is the treated outcome model and \(\hat{\mu}_0\) is the control outcome model.

Financial use cases:

Which firms benefit most from analyst coverage?
How does the effect of a tax change vary by firm size?
Heterogeneous impact of ESG scores on cost of capital.
Personalized treatment effects for portfolio tilts.

Parameters:

outcome (ndarray) – Observed outcomes (1D array, length n).
treatment (ndarray) – Binary treatment indicator (1D array of 0s and 1s).
covariates (ndarray) – Covariate matrix (n, p) for effect heterogeneity.
n_estimators (int, default: 200) – Number of trees in each forest. Default is 200.
min_samples_leaf (int, default: 5) – Minimum samples per leaf. Larger values give smoother estimates. Default is 5.
max_depth (int | None, default: None) – Maximum tree depth. None means unlimited. Default is None.
honest (bool, default: True) – If True, uses sample splitting for honest estimation. Default is True.
seed (int, default: 42) – Random seed for reproducibility. Default is 42.

Returns:

ATE, CATE for each observation, standard error, confidence interval, and feature importances.

Return type:

CausalForestResult

Example

>>> import numpy as np
>>> rng = np.random.default_rng(42)
>>> n = 1000
>>> X = rng.normal(size=(n, 3))
>>> T = (rng.uniform(size=n) > 0.5).astype(float)
>>> # Heterogeneous effect: tau(x) = 2 + x[:,0]
>>> tau = 2.0 + X[:, 0]
>>> Y = X[:, 1] + tau * T + rng.normal(0, 0.5, n)
>>> result = causal_forest(Y, T, X)
>>> abs(result.ate - 2.0) < 1.0
True

References

Wager, S. & Athey, S. (2018). “Estimation and Inference of Heterogeneous Treatment Effects using Random Forests.”
Kunzel, S. R. et al. (2019). “Metalearners for Estimating Heterogeneous Treatment Effects using Machine Learning.”

mediation_analysis(outcome, treatment, mediator, covariates=None)[source]¶

Baron-Kenny mediation analysis with Sobel test.

Decomposes the total effect of treatment on outcome into a direct effect and an indirect effect through a mediator variable:

\[ \begin{align}\begin{aligned}\text{Path c (total):} \quad Y = \alpha_1 + c \cdot T + \varepsilon_1\\\text{Path a:} \quad M = \alpha_2 + a \cdot T + \varepsilon_2\\\text{Paths c', b:} \quad Y = \alpha_3 + c' \cdot T + b \cdot M + \varepsilon_3\end{aligned}\end{align} \]

The indirect effect is \(a \times b\) (equivalently \(c - c'\)). The Sobel test provides a z-test for the significance of the indirect effect:

\[z = \frac{a \cdot b}{\sqrt{b^2 \cdot SE_a^2 + a^2 \cdot SE_b^2}}\]

Financial use cases:

Does analyst coverage affect stock returns through improved information environment (bid-ask spread)?
Does ESG performance impact firm value through the channel of reduced cost of capital?
Does monetary policy affect equity prices through the expectations channel or the risk premium channel?
Does order flow affect prices through inventory or information?

Parameters:

outcome (ndarray) – Outcome variable Y (1D array).
treatment (ndarray) – Treatment variable T (1D array, can be continuous).
mediator (ndarray) – Mediator variable M (1D array).
covariates (ndarray | None, default: None) – Optional control variables (n, k). Default is None.

Returns:

Total, direct, indirect effects, Sobel test, and proportion mediated.

Return type:

MediationResult

Example

>>> import numpy as np
>>> rng = np.random.default_rng(42)
>>> n = 1000
>>> T = rng.binomial(1, 0.5, n).astype(float)
>>> M = 0.8 * T + rng.normal(0, 0.5, n)     # path a = 0.8
>>> Y = 0.5 * T + 0.6 * M + rng.normal(0, 0.5, n)  # c'=0.5, b=0.6
>>> result = mediation_analysis(Y, T, M)
>>> abs(result.indirect_effect - 0.48) < 0.2  # a*b ~ 0.48
True
>>> result.sobel_p < 0.05  # significant mediation
True

References

Baron, R. M. & Kenny, D. A. (1986). “The Moderator-Mediator Variable Distinction.”
Sobel, M. E. (1982). “Asymptotic Confidence Intervals for Indirect Effects in Structural Equation Models.”
MacKinnon, D. P. (2008). “Introduction to Statistical Mediation Analysis.”

regression_discontinuity_robust(outcome, running_var, cutoff=0.0, bandwidth=None, poly_order=1, fuzzy_treatment=None, kernel='triangular', run_mccrary=True)[source]¶

Robust regression discontinuity design with optimal bandwidth.

Enhanced RDD implementation with:

IK bandwidth selection: Imbens-Kalyanaraman (2012) optimal bandwidth that balances bias and variance.
Local polynomial regression: Supports linear (order 1) and quadratic (order 2) local fits with kernel weighting.
McCrary density test: Tests for manipulation of the running variable at the cutoff.
Sharp vs. fuzzy RD: Sharp RD uses the cutoff directly. Fuzzy RD uses a two-stage approach when treatment is probabilistic at the cutoff (analogous to IV).

\[ \begin{align}\begin{aligned}\text{Sharp RD:} \quad \hat{\tau} = \lim_{x \downarrow c} E[Y|X=x] - \lim_{x \uparrow c} E[Y|X=x]\\\text{Fuzzy RD:} \quad \hat{\tau} = \frac{\lim_{x \downarrow c} E[Y|X=x] - \lim_{x \uparrow c} E[Y|X=x]}{\lim_{x \downarrow c} E[D|X=x] - \lim_{x \uparrow c} E[D|X=x]}\end{aligned}\end{align} \]

Financial use cases:

Effect of index inclusion on stock liquidity (market-cap threshold).
Impact of credit rating changes at rating boundaries.
Effect of SEC disclosure requirements at firm-size thresholds.
Impact of margin requirements at leverage cutoffs.

Parameters:

outcome (ndarray) – Outcome variable (1D array).
running_var (ndarray) – Running variable that determines treatment (1D array).
cutoff (float, default: 0.0) – Cutoff value. Default is 0.0.
bandwidth (float | None, default: None) – Bandwidth for local regression. If None, uses the IK optimal bandwidth. Default is None.
poly_order (int, default: 1) – Order of the local polynomial (1 = linear, 2 = quadratic). Default is 1.
fuzzy_treatment (ndarray | None, default: None) – Actual treatment indicator for fuzzy RD (1D array of 0s and 1s). If None, uses sharp RD. Default is None.
kernel (str, default: 'triangular') – Kernel for weighting observations: 'triangular', 'uniform', or 'epanechnikov'. Default is 'triangular'.
run_mccrary (bool, default: True) – Whether to run the McCrary manipulation test. Default is True.

Returns:

RD estimate with robust standard errors, bandwidth information, and diagnostic tests.

Return type:

RDRobustResult

Raises:

ValueError – If insufficient observations or invalid parameters.

Example

>>> import numpy as np
>>> rng = np.random.default_rng(42)
>>> n = 2000
>>> x = rng.uniform(-2, 2, n)
>>> y = 0.5 * x + 1.5 * (x >= 0) + rng.normal(0, 0.3, n)
>>> result = regression_discontinuity_robust(y, x, cutoff=0.0)
>>> abs(result.ate - 1.5) < 0.5
True
>>> result.bandwidth_method
'imbens_kalyanaraman'

References

Imbens, G. W. & Kalyanaraman, K. (2012). “Optimal Bandwidth Choice for the Regression Discontinuity Estimator.”
McCrary, J. (2008). “Manipulation of the Running Variable in the Regression Discontinuity Design.”
Cattaneo, M. D. et al. (2020). “A Practical Introduction to Regression Discontinuity Designs.”

bounds_analysis(outcome, treatment, selection=None, outcome_bounds=None, method='manski')[source]¶

Compute Manski or Lee bounds for partial identification of treatment effects.

When standard identifying assumptions (unconfoundedness, exclusion restriction) are too strong, partial identification provides bounds on the treatment effect that hold under weaker assumptions.

Manski (1990) bounds require only:

Known outcome support (bounded outcomes).
Random treatment assignment OR treatment is observed.

The treatment effect is bounded by:

\[E[Y|D=1] - y_{\max} \cdot P(D=0) - E[Y|D=0] \cdot P(D=0) \leq \text{ATE} \leq E[Y|D=1] - y_{\min} \cdot P(D=0) - E[Y|D=0] \cdot P(D=0)\]

Lee (2009) bounds handle sample selection (attrition):

When treatment affects whether the outcome is observed.
Tightens bounds by trimming the group with higher selection rate.

Financial use cases:

Bounding the effect of financial literacy programs on savings when participation is endogenous.
Bounding returns to education when the outcome (wages) is only observed for the employed.
Partial identification of trading strategy effects when some trades are censored.

Parameters:

outcome (ndarray) – Observed outcomes (1D array). For Lee bounds, should only contain observed (non-missing) values.
treatment (ndarray) – Binary treatment indicator (1D array of 0s and 1s).
selection (ndarray | None, default: None) – Binary selection indicator (1D array). Required for Lee bounds. 1 = outcome observed, 0 = outcome missing/attrited. Must be the same length as treatment. Default is None.
outcome_bounds (tuple[float, float] | None, default: None) – (y_min, y_max) bounds on the outcome support. Required for Manski bounds. If None, uses the observed min and max.
method (str, default: 'manski') – 'manski' for Manski worst-case bounds or 'lee' for Lee trimming bounds. Default is 'manski'.

Returns:

Lower and upper bounds on the treatment effect, whether the effect is sign-identified, and confidence intervals.

Return type:

BoundsResult

Example

>>> import numpy as np
>>> rng = np.random.default_rng(42)
>>> n = 1000
>>> T = rng.binomial(1, 0.5, n).astype(float)
>>> Y = 2.0 * T + rng.normal(0, 1, n)
>>> result = bounds_analysis(Y, T, outcome_bounds=(-5, 10))
>>> result.lower_bound < 2.0 < result.upper_bound
True
>>> result.identified  # both bounds positive -> sign identified
True

References

Manski, C. F. (1990). “Nonparametric Bounds on Treatment Effects.”
Lee, D. S. (2009). “Training, Wages, and Sample Selection: Estimating Sharp Bounds on Treatment Effects.”
Horowitz, J. L. & Manski, C. F. (2000). “Nonparametric Analysis of Randomized Experiments with Missing Covariate and Outcome Data.”

Treatment Effects¶

Treatment effect estimation via pure numpy/scipy implementations.

Causal inference methods estimate the causal effect of a treatment or intervention on an outcome, controlling for confounding variables. In finance, these methods answer questions like:

“Did the Fed rate cut cause the equity rally, or would it have happened anyway?”
“What is the causal effect of share buyback announcements on stock returns?”
“Did the new regulatory rule reduce trading costs, controlling for market conditions?”

This module provides six complementary causal inference estimators:

Inverse Probability Weighting (ipw_ate) – reweights observations by the inverse of their propensity score to create a pseudo-population where treatment is independent of covariates. Simple but sensitive to extreme propensity scores.
Nearest-neighbor matching (matching_ate) – matches each treated unit to the closest control unit(s) in covariate space. Intuitive and nonparametric, but can be biased in high dimensions.
Doubly robust estimation (doubly_robust_ate) – combines IPW with outcome regression. Consistent if either the propensity score model or the outcome model is correctly specified. The recommended default when you are unsure about model specification.
Regression discontinuity (regression_discontinuity) – exploits a sharp cutoff in a running variable to identify a local treatment effect. Example: index inclusion at a market-cap threshold.
Synthetic control (synthetic_control) – constructs a weighted combination of control units that mimics the treated unit’s pre-treatment trajectory. The treatment effect is the post-treatment gap. Ideal for single-unit studies (e.g., one country, one firm).
Difference-in-differences (diff_in_diff) – compares the before-after change in the treatment group to the before-after change in the control group. Requires the parallel trends assumption.

Utilities:

propensity_score: logistic regression for treatment assignment probabilities.

How to choose:

Random treatment, confounders observed: IPW or doubly robust.
Sharp cutoff determines treatment: regression discontinuity.
One treated unit, many controls over time: synthetic control.
Two-period panel, treatment at a known time: diff-in-diff.
Uncertain model specification: doubly robust (safest).

References

Rubin (1974), “Estimating Causal Effects of Treatments”
Imbens & Rubin (2015), “Causal Inference for Statistics, Social, and Biomedical Sciences”
Abadie, Diamond & Hainmueller (2010), “Synthetic Control Methods for Comparative Case Studies”

propensity_score(treatment, covariates)[source]¶

Estimate propensity scores using logistic regression.

Interpretation:

A propensity score of 0.5 means the unit was equally likely to be treated or not – ideal for causal inference.
Scores near 0 or 1 indicate units that were almost deterministically treated/untreated. These are problematic: the overlap assumption fails, and IPW weights become extreme.
Check the overlap: plot histograms of propensity scores separately for treated and control groups. If they don’t overlap, causal inference is unreliable in the non-overlap region.

When to use:

As input to ipw_ate or doubly_robust_ate.
For propensity score matching (match treated to controls with similar scores).
To diagnose selection bias: if propensity scores are very different between groups, selection is strong.

Red flags:

Most scores near 0 or 1: poor overlap, IPW will be unstable.
Treatment is nearly perfectly predicted: positivity violation.
Scores are all similar (~0.5): treatment is nearly random, which is actually ideal for causal inference.

Parameters:

treatment (ndarray) – Binary treatment indicator (1-D array of 0s and 1s).
covariates (ndarray) – Covariate matrix of shape (n_samples, n_features). An intercept column is added automatically; do not include one.

Return type:

ndarray

Returns:

Estimated propensity scores (probabilities) of shape (n_samples,), each in [0.01, 0.99].

Example

>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> X = rng.normal(size=(200, 3))
>>> T = (X[:, 0] + rng.normal(size=200) > 0).astype(float)
>>> ps = propensity_score(T, X)
>>> # Check overlap: both groups should have scores in [0.2, 0.8]
>>> print(f"Treated scores: [{ps[T==1].min():.2f}, {ps[T==1].max():.2f}]")
>>> print(f"Control scores: [{ps[T==0].min():.2f}, {ps[T==0].max():.2f}]")

See also

ipw_ate: Use propensity scores to estimate the ATE. doubly_robust_ate: Combines propensity scores with outcome

regression for robustness.

ipw_ate(outcome, treatment, propensity_scores)[source]¶

Estimate the Average Treatment Effect using Inverse Probability Weighting.

The Horvitz-Thompson estimator is:

\[\widehat{\text{ATE}} = \frac{1}{n}\sum_{i=1}^{n} \frac{T_i\,Y_i}{e(X_i)} - \frac{1}{n}\sum_{i=1}^{n} \frac{(1 - T_i)\,Y_i}{1 - e(X_i)}\]

Interpretation:

ate > 0: Treatment increases the outcome on average.
ate < 0: Treatment decreases the outcome on average.
The 95% CI is asymptotically valid under correct propensity score specification and the overlap assumption.
If CI includes 0: cannot conclude treatment has an effect.

When to use:

You have a well-estimated propensity score model.
There is good overlap (propensity scores are not extreme).
You are willing to assume no unobserved confounders.

When NOT to use:

Propensity scores near 0 or 1: weights explode, variance is huge. Use doubly_robust_ate instead.
Strong model misspecification concerns: use doubly_robust_ate (consistent if either model is right).
Very few treated or control units: matching may be better.

Parameters:

outcome (ndarray) – Observed outcomes (1-D array).
treatment (ndarray) – Binary treatment indicator (0s and 1s).
propensity_scores (ndarray) – Estimated propensity scores from propensity_score() (1-D array).

Return type:

ATEResult

Returns:

ATEResult with ATE estimate, standard error, 95% confidence interval, and sample sizes.

Example

>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> n = 500
>>> X = rng.normal(size=(n, 2))
>>> T = (X[:, 0] > 0).astype(float)
>>> Y = 2.0 * T + X[:, 0] + rng.normal(size=n)
>>> ps = propensity_score(T, X)
>>> result = ipw_ate(Y, T, ps)
>>> print(f"ATE: {result.ate:.2f} [{result.ci_lower:.2f}, {result.ci_upper:.2f}]")

See also

doubly_robust_ate: More robust alternative (recommended default). matching_ate: Nonparametric matching estimator.

matching_ate(outcome, treatment, covariates, n_neighbors=1)[source]¶

Estimate the ATE using nearest-neighbor matching on covariates.

Interpretation:

ate is the average difference in outcomes between treated units and their matched controls, averaged over all units.
Good matching means the matched pairs are genuinely similar. Check the covariate balance after matching.
n_neighbors = 1: lowest bias, highest variance (each unit matched to exactly one partner).
n_neighbors = 3-5: reduces variance at the cost of some bias (averaging over less-similar matches).

When to use:

Low-dimensional covariates (2-5 features).
When you want a transparent, model-free estimator.
When propensity score modelling is difficult.

Limitations:

Curse of dimensionality: in high dimensions, matches become poor. Use propensity score matching instead.
Biased with finite samples (Abadie & Imbens, 2006).
No extrapolation: cannot estimate effects outside the overlap region.

Parameters:

outcome (ndarray) – Observed outcomes (1-D array).
treatment (ndarray) – Binary treatment indicator (0s and 1s).
covariates (ndarray) – Covariate matrix of shape (n_samples, n_features).
n_neighbors (int, default: 1) – Number of nearest neighbors to match with (default 1). More neighbors reduce variance but may increase bias.

Return type:

ATEResult

Returns:

ATEResult with ATE estimate, standard error, and 95% CI.

Example

>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> X = rng.normal(size=(300, 2))
>>> T = (X[:, 0] > 0).astype(float)
>>> Y = 1.5 * T + X[:, 0] + rng.normal(size=300)
>>> result = matching_ate(Y, T, X)
>>> print(f"ATE: {result.ate:.2f} (se={result.se:.2f})")

See also

ipw_ate: Weighting-based estimator (no matching). doubly_robust_ate: Recommended default when uncertain about

model specification.

doubly_robust_ate(outcome, treatment, covariates)[source]¶

Estimate the ATE using the doubly robust (augmented IPW) estimator.

This is the recommended default when you are unsure about model specification.

Internally, the function:

Estimates propensity scores via logistic regression.
Fits separate OLS outcome models for treated and control groups.
Combines both in the augmented IPW formula.

Parameters:

outcome (ndarray) – Observed outcomes (1-D array).
treatment (ndarray) – Binary treatment indicator (0s and 1s).
covariates (ndarray) – Covariate matrix of shape (n_samples, n_features).

Returns:

ATE estimate with standard error and 95% confidence: interval.

Return type:

ATEResult

Example

>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> X = rng.normal(size=(500, 3))
>>> T = (X[:, 0] > 0).astype(float)
>>> Y = 2.0 * T + X[:, 0] + rng.normal(size=500)
>>> result = doubly_robust_ate(Y, T, X)
>>> 0.5 < result.ate < 3.5
True

See also

ipw_ate: Pure IPW estimator. matching_ate: Nonparametric matching estimator. propensity_score: Logistic regression propensity scores.

References

Robins, J.M., Rotnitzky, A. & Zhao, L.P. (1994). Estimation of Regression Coefficients When Some Regressors Are Not Always Observed. JASA 89(427).

regression_discontinuity(outcome, running_var, cutoff=0.0, bandwidth=None)[source]¶

Estimate treatment effect using a sharp regression discontinuity design.

Financial examples:

Market-cap threshold for S&P 500 inclusion: does index inclusion cause a price premium?
Credit score cutoff for loan approval: does access to credit affect firm investment?
Regulatory thresholds: do firms just above a reporting threshold behave differently?

Interpretation:

ate is the LOCAL average treatment effect at the cutoff. It tells you the effect for units right at the boundary, not the average effect for all units.
The validity depends on the assumption that units cannot precisely manipulate their position relative to the cutoff. If they can (e.g., firms just barely meet a threshold by manipulating data), the estimate is biased. Use the McCrary test (in regression_discontinuity_robust) to check for manipulation.
bandwidth matters: too wide includes units far from the cutoff (more bias); too narrow uses few observations (more variance).

Red flags:

Very different n_left and n_right: asymmetric data.
Bandwidth is very small: few observations, imprecise.
Density of running variable has a spike at the cutoff: possible manipulation (run McCrary test).

Parameters:

outcome (ndarray) – Observed outcomes (1-D array).
running_var (ndarray) – Running variable that determines treatment (1-D array). Units with running_var >= cutoff are treated.
cutoff (float, default: 0.0) – Cutoff value (default 0.0).
bandwidth (float | None, default: None) – Bandwidth for local linear regression. If None, uses Silverman’s rule. Narrower = less bias, more variance.

Return type:

RDResult

Returns:

RDResult with estimated local ATE at the cutoff, standard error, confidence interval, and bandwidth used.

Raises:

ValueError – If fewer than 2 observations on either side within bandwidth.

Example

>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> n = 500
>>> X = rng.uniform(-1, 1, n)
>>> Y = 0.5 * (X >= 0).astype(float) + X + rng.normal(0, 0.2, n)
>>> result = regression_discontinuity(Y, X, cutoff=0.0)
>>> print(f"RD estimate: {result.ate:.3f} (se={result.se:.3f})")

See also

regression_discontinuity_robust: With IK bandwidth and McCrary test. diff_in_diff: Before/after comparison with control group.

synthetic_control(treated_outcomes, donor_outcomes, pre_period)[source]¶

Estimate treatment effect using the synthetic control method.

Use synthetic control for single-unit case studies (e.g., the effect of a policy change on one country, or a management change at one firm) when a natural control group is unavailable.

Parameters:

treated_outcomes (ndarray) – Outcomes for the treated unit over all time periods (1-D array of length T).
donor_outcomes (ndarray) – Outcomes for donor (control) units, shape (T, n_donors).
pre_period (int) – Number of pre-treatment time periods. The first pre_period rows are used for fitting; the remainder is the post-treatment evaluation window.

Returns:

Estimated treatment effect (average: post-period gap), donor weights, synthetic outcomes, and pre-period RMSE as a fit quality diagnostic.

Return type:

SyntheticControlResult

Raises:

ValueError – If pre_period is out of range.

Example

>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> T, n_donors = 20, 5
>>> donors = rng.normal(size=(T, n_donors)).cumsum(axis=0)
>>> treated = donors @ np.array([0.3, 0.5, 0.2, 0, 0])
>>> treated[15:] += 2.0  # treatment effect of 2.0
>>> result = synthetic_control(treated, donors, pre_period=15)
>>> result.ate > 0
True

See also

diff_in_diff: Two-group before/after comparison. regression_discontinuity: Cutoff-based causal identification.

References

Abadie, A., Diamond, A. & Hainmueller, J. (2010). Synthetic Control Methods for Comparative Case Studies. JASA 105(490).

diff_in_diff(outcome, treatment, post, entity=None)[source]¶

Estimate treatment effect using difference-in-differences (DID).

\[\hat{\delta} = (\bar{Y}_{T,\text{post}} - \bar{Y}_{T,\text{pre}}) - (\bar{Y}_{C,\text{post}} - \bar{Y}_{C,\text{pre}})\]

The key identifying assumption is parallel trends: absent the treatment, both groups would have followed the same trajectory.

Interpretation:

ate (delta): The causal effect of the treatment on the treated group, net of any common time trend.
Positive delta = treatment increased the outcome.
The group means (pre_treatment_mean, post_treatment_mean, pre_control_mean, post_control_mean) can be used to construct a “parallel trends” plot: plot the treated and control group means over time. If they are parallel pre-treatment, the assumption is plausible.

How to check parallel trends:

Plot both group means in the pre-treatment period. They should move in parallel (not necessarily at the same level – DID allows for level differences).
Run placebo DID tests at fake treatment dates in the pre-period. The estimated “effects” should be near zero.

Financial examples:

Effect of a new regulation on trading costs (treated = affected firms, control = unaffected firms).
Impact of an IPO on a firm’s operating performance (treated = IPO firms, control = matched private firms).
Did the COVID-19 lockdown affect commercial real estate values differently than residential?

Parameters:

outcome (ndarray) – Observed outcomes (1-D array).
treatment (ndarray) – Binary group indicator: 1 = treatment group, 0 = control group.
post (ndarray) – Binary time indicator: 1 = post-treatment, 0 = pre-treatment.
entity (ndarray | None, default: None) – Entity identifiers for panel data. If provided, entity fixed effects are included.

Return type:

DIDResult

Returns:

DIDResult with the DID estimate, standard error, 95% CI, and group means for diagnostics.

Example

>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> n = 400
>>> treat = np.repeat([0, 1], n // 2)
>>> post_period = np.tile([0, 1], n // 2)
>>> Y = 1.0 + 0.5 * treat + 0.3 * post_period + 2.0 * treat * post_period
>>> Y += rng.normal(0, 0.5, n)
>>> result = diff_in_diff(Y, treat, post_period)
>>> print(f"DID: {result.ate:.2f} [{result.ci_lower:.2f}, {result.ci_upper:.2f}]")

See also

synthetic_control: For single-unit case studies. event_study: For measuring effects around specific dates.

granger_causality(x, y, max_lag=10, significance=0.05)[source]¶

Test whether time series x Granger-causes time series y.

Granger causality tests whether past values of x contain information that helps predict y beyond what past values of y alone provide. The null hypothesis is that x does not Granger-cause y.

The test fits two VAR models for each candidate lag order p:

The F-statistic tests \(H_0: \gamma_1 = \cdots = \gamma_p = 0\).

Interpretation:

reject = True: Past x helps predict y beyond y’s own history. This is evidence of a predictive relationship.
reject = False: No evidence that x helps predict y.
optimal_lag: The lag order selected by BIC. Economically, this tells you “how far back does x’s influence on y extend?”
Check both directions: if x Granger-causes y AND y Granger-causes x, there may be a common driver (not a directional causal effect).
Both series should be stationary. If they are I(1), difference first. If they are cointegrated, use a VECM instead.

Financial use cases:

Does the VIX Granger-cause S&P 500 returns?
Does order flow predict price changes?
Does the yield curve lead GDP growth?
Does sentiment data help forecast volatility?

Parameters:

x (ndarray) – Potential cause time series (1D array, length T).
y (ndarray) – Potential effect time series (1D array, length T).
max_lag (int, default: 10) – Maximum lag order to test. The optimal lag is selected by BIC. Default is 10.
significance (float, default: 0.05) – Significance level for rejection. Default is 0.05.

Return type:

GrangerResult

Returns:

GrangerResult with F-statistic, p-value, optimal lag, direction, and per-lag results.

Raises:

ValueError – If the time series are too short for the requested max_lag.

Example

>>> import numpy as np
>>> rng = np.random.default_rng(42)
>>> x = rng.normal(size=200)
>>> y = np.zeros(200)
>>> for t in range(1, 200):
...     y[t] = 0.5 * y[t-1] + 0.3 * x[t-1] + rng.normal(0, 0.5)
>>> result = granger_causality(x, y, max_lag=5)
>>> result.reject  # should be True — x Granger-causes y
True

References

Granger, C. W. J. (1969). “Investigating Causal Relations by Econometric Models and Cross-spectral Methods.”
Toda, H. Y. & Yamamoto, T. (1995). “Statistical inference in vector autoregressions with possibly integrated processes.”

Notes

Uses statsmodels.tsa.stattools.grangercausalitytests internally. The function tests \(x \to y\) (whether x helps predict y). To test the reverse direction, swap the arguments.

instrumental_variable(outcome, endogenous, instruments, exogenous=None)[source]¶

Two-stage least squares (2SLS) instrumental variable estimation.

Relevance: Z is correlated with the endogenous regressor X.
Exclusion: Z affects the outcome Y only through X.

\[ \begin{align}\begin{aligned}\text{First stage:} \quad X = Z \pi + W \delta + v\\\text{Second stage:} \quad Y = \hat{X} \beta + W \gamma + \varepsilon\end{aligned}\end{align} \]

Diagnostics provided:

First-stage F-statistic: Tests instrument relevance. F < 10 indicates weak instruments (Staiger & Stock, 1997).
Hausman test: Compares OLS and 2SLS coefficients. Rejection supports the endogeneity of the regressor and the need for IV.
Sargan overidentification test: When there are more instruments than endogenous regressors, tests the joint validity of instruments. Rejection suggests at least one instrument is invalid.

Financial use cases:

Estimating the effect of leverage on firm value using regulatory shocks as instruments.
Measuring the causal impact of analyst coverage on stock liquidity using index inclusion as an instrument.
Identifying the effect of order flow on prices using weather (affects trader mood) as an instrument.

Parameters:

outcome (ndarray) – Dependent variable Y (1D array, length n).
endogenous (ndarray) – Endogenous regressor X (1D array, length n).
instruments (ndarray) – Instrumental variables Z (n, k) matrix where k >= 1. More instruments than endogenous regressors allows the Sargan overidentification test.
exogenous (ndarray | None, default: None) – Exogenous control variables W (n, m). An intercept is always added. Default is None (intercept only).

Returns:

2SLS coefficient, standard error, confidence interval, and diagnostic tests.

Return type:

IVResult

Raises:

ValueError – If dimensions are incompatible or there are too few instruments.

Example

>>> import numpy as np
>>> rng = np.random.default_rng(42)
>>> n = 1000
>>> z = rng.normal(size=(n, 1))  # instrument
>>> u = rng.normal(size=n)       # unobserved confounder
>>> x = z.ravel() + u + rng.normal(0, 0.5, n)  # endogenous
>>> y = 2.0 * x + u + rng.normal(0, 0.5, n)    # outcome
>>> result = instrumental_variable(y, x, z)
>>> abs(result.coefficient - 2.0) < 0.5
True

References

Angrist, J. D. & Pischke, J.-S. (2009). “Mostly Harmless Econometrics: An Empiricist’s Companion.”
Staiger, D. & Stock, J. H. (1997). “Instrumental Variables Regression with Weak Instruments.”
Sargan, J. D. (1958). “The Estimation of Economic Relationships Using Instrumental Variables.”

event_study(returns, market_returns, event_indices, estimation_window=120, event_window_pre=5, event_window_post=5, gap=10)[source]¶

Conduct a market-model event study for one or more events.

The event study methodology estimates abnormal returns around an event by comparing realized returns to expected returns from a market model estimated during a pre-event estimation window.

Methodology:

Estimation window: Fit \(R_{i,t} = \alpha + \beta R_{m,t} + \varepsilon_t\) over a window ending gap days before the event.
Event window: Compute abnormal returns \(AR_t = R_{i,t} - (\hat{\alpha} + \hat{\beta} R_{m,t})\) for each day in [-event_window_pre, +event_window_post].
CAR: Cumulative Abnormal Return \(= \sum AR_t\).
Statistical test: \(t = \text{CAR} / (\hat{\sigma} \sqrt{L})\) where L is the length of the event window and \(\hat{\sigma}\) is the residual std from the estimation window.
Cross-sectional test (multiple events): Use the distribution of individual CARs across events.

Financial use cases:

Impact of earnings announcements on stock prices.
Effect of M&A announcements on acquirer/target returns.
Regulatory changes (e.g., short-selling bans).
Central bank policy announcements.

Parameters:

returns (ndarray) – Asset return series (1D array, length T).
market_returns (ndarray) – Market (benchmark) return series (1D array, length T).
event_indices (list[int] | ndarray) – Index positions in the return series where each event occurs. For a single event, pass [idx].
estimation_window (int, default: 120) – Number of days in the estimation window. Default is 120.
event_window_pre (int, default: 5) – Number of days before the event in the event window. Default 5.
event_window_post (int, default: 5) – Number of days after the event in the event window. Default 5.
gap (int, default: 10) – Gap between the estimation window and the event window to avoid contamination. Default is 10.

Returns:

CAR, abnormal returns, t-statistics, and diagnostics.

Return type:

EventStudyResult

Raises:

ValueError – If any event does not have enough data for the estimation or event windows.

Example

>>> import numpy as np
>>> rng = np.random.default_rng(42)
>>> market = rng.normal(0.0005, 0.01, 500)
>>> stock = 0.001 + 1.2 * market + rng.normal(0, 0.005, 500)
>>> stock[250:256] += 0.02  # inject positive event at day 250
>>> result = event_study(stock, market, [250], event_window_post=5)
>>> result.car > 0  # should detect the positive abnormal return
True

References

MacKinlay, A. C. (1997). “Event Studies in Economics and Finance.” Journal of Economic Literature, 35(1), 13-39.
Brown, S. J. & Warner, J. B. (1985). “Using Daily Stock Returns.” Journal of Financial Economics, 14(1), 3-31.

synthetic_control_weights(treated_outcomes, donor_outcomes, pre_period, run_placebo=False, donor_names=None)[source]¶

Enhanced synthetic control with donor selection and placebo inference.

Extends the basic synthetic control method with:

Pre-treatment fit quality: RMSPE (root mean squared prediction error) measures how well the synthetic unit matches the treated unit before treatment. Lower is better.
Placebo (permutation) tests: Applies the synthetic control method iteratively, treating each donor as if it were the treated unit. The RMSPE ratio (post/pre) for the true treated unit is compared to the distribution of placebo ratios. A p-value is the fraction of placebo ratios at least as large as the treated unit’s.
Gap plot data: Period-by-period gaps and cumulative effects for visualization.

\[\text{RMSPE ratio} = \frac{\text{RMSPE}_{\text{post}}}{\text{RMSPE}_{\text{pre}}}\]

Financial use cases:

Impact of a merger/acquisition on the target firm’s stock price relative to a synthetic peer.
Effect of a regulatory change on trading volume for a single exchange.
Impact of a central bank’s unconventional monetary policy on a country’s equity market.

Parameters:

treated_outcomes (ndarray) – Outcomes for the treated unit over all time periods (1D, length T).
donor_outcomes (ndarray) – Outcomes for donor units (T, n_donors).
pre_period (int) – Number of pre-treatment time periods.
run_placebo (bool, default: False) – If True, runs placebo tests for each donor to compute permutation-based p-values. Default is False (expensive for many donors).
donor_names (list[str] | None, default: None) – Optional names for donor units for interpretability.

Returns:

Weights, RMSPE, placebo p-value, gap data.

Return type:

SyntheticControlWeightsResult

Example

>>> import numpy as np
>>> rng = np.random.default_rng(42)
>>> T, J = 50, 10
>>> common = np.cumsum(rng.normal(0, 0.5, T))
>>> donors = np.column_stack([
...     common + rng.normal(0, 0.3, T) for _ in range(J)
... ])
>>> true_w = rng.dirichlet(np.ones(J))
>>> treated = donors @ true_w + rng.normal(0, 0.1, T)
>>> treated[30:] += 5.0  # treatment effect of 5
>>> result = synthetic_control_weights(treated, donors, 30,
...                                    run_placebo=True)
>>> result.ate  # should be close to 5.0
5.0...
>>> result.placebo_p_value < 0.2  # significant
True

References

Abadie, A., Diamond, A. & Hainmueller, J. (2010). “Synthetic Control Methods for Comparative Case Studies.”
Abadie, A. (2021). “Using Synthetic Controls: Feasibility, Data Requirements, and Methodological Aspects.”

causal_forest(outcome, treatment, covariates, n_estimators=200, min_samples_leaf=5, max_depth=None, honest=True, seed=42)[source]¶

Estimate heterogeneous treatment effects using a causal forest.

If honest=True, the data is split: one half for building the tree structure, the other half for estimating leaf predictions. This reduces overfitting and provides valid confidence intervals.

\[\hat{\tau}(x) = \hat{\mu}_1(x) - \hat{\mu}_0(x)\]

where \(\hat{\mu}_1\) is the treated outcome model and \(\hat{\mu}_0\) is the control outcome model.

Financial use cases:

Which firms benefit most from analyst coverage?
How does the effect of a tax change vary by firm size?
Heterogeneous impact of ESG scores on cost of capital.
Personalized treatment effects for portfolio tilts.

Parameters:

outcome (ndarray) – Observed outcomes (1D array, length n).
treatment (ndarray) – Binary treatment indicator (1D array of 0s and 1s).
covariates (ndarray) – Covariate matrix (n, p) for effect heterogeneity.
n_estimators (int, default: 200) – Number of trees in each forest. Default is 200.
min_samples_leaf (int, default: 5) – Minimum samples per leaf. Larger values give smoother estimates. Default is 5.
max_depth (int | None, default: None) – Maximum tree depth. None means unlimited. Default is None.
honest (bool, default: True) – If True, uses sample splitting for honest estimation. Default is True.
seed (int, default: 42) – Random seed for reproducibility. Default is 42.

Returns:

ATE, CATE for each observation, standard error, confidence interval, and feature importances.

Return type:

CausalForestResult

Example

>>> import numpy as np
>>> rng = np.random.default_rng(42)
>>> n = 1000
>>> X = rng.normal(size=(n, 3))
>>> T = (rng.uniform(size=n) > 0.5).astype(float)
>>> # Heterogeneous effect: tau(x) = 2 + x[:,0]
>>> tau = 2.0 + X[:, 0]
>>> Y = X[:, 1] + tau * T + rng.normal(0, 0.5, n)
>>> result = causal_forest(Y, T, X)
>>> abs(result.ate - 2.0) < 1.0
True

References

Wager, S. & Athey, S. (2018). “Estimation and Inference of Heterogeneous Treatment Effects using Random Forests.”
Kunzel, S. R. et al. (2019). “Metalearners for Estimating Heterogeneous Treatment Effects using Machine Learning.”

mediation_analysis(outcome, treatment, mediator, covariates=None)[source]¶

Baron-Kenny mediation analysis with Sobel test.

Decomposes the total effect of treatment on outcome into a direct effect and an indirect effect through a mediator variable:

The indirect effect is \(a \times b\) (equivalently \(c - c'\)). The Sobel test provides a z-test for the significance of the indirect effect:

\[z = \frac{a \cdot b}{\sqrt{b^2 \cdot SE_a^2 + a^2 \cdot SE_b^2}}\]

Financial use cases:

Does analyst coverage affect stock returns through improved information environment (bid-ask spread)?
Does ESG performance impact firm value through the channel of reduced cost of capital?
Does monetary policy affect equity prices through the expectations channel or the risk premium channel?
Does order flow affect prices through inventory or information?

Parameters:

outcome (ndarray) – Outcome variable Y (1D array).
treatment (ndarray) – Treatment variable T (1D array, can be continuous).
mediator (ndarray) – Mediator variable M (1D array).
covariates (ndarray | None, default: None) – Optional control variables (n, k). Default is None.

Returns:

Total, direct, indirect effects, Sobel test, and proportion mediated.

Return type:

MediationResult

Example

>>> import numpy as np
>>> rng = np.random.default_rng(42)
>>> n = 1000
>>> T = rng.binomial(1, 0.5, n).astype(float)
>>> M = 0.8 * T + rng.normal(0, 0.5, n)     # path a = 0.8
>>> Y = 0.5 * T + 0.6 * M + rng.normal(0, 0.5, n)  # c'=0.5, b=0.6
>>> result = mediation_analysis(Y, T, M)
>>> abs(result.indirect_effect - 0.48) < 0.2  # a*b ~ 0.48
True
>>> result.sobel_p < 0.05  # significant mediation
True

References

Baron, R. M. & Kenny, D. A. (1986). “The Moderator-Mediator Variable Distinction.”
Sobel, M. E. (1982). “Asymptotic Confidence Intervals for Indirect Effects in Structural Equation Models.”
MacKinnon, D. P. (2008). “Introduction to Statistical Mediation Analysis.”

regression_discontinuity_robust(outcome, running_var, cutoff=0.0, bandwidth=None, poly_order=1, fuzzy_treatment=None, kernel='triangular', run_mccrary=True)[source]¶

Robust regression discontinuity design with optimal bandwidth.

Enhanced RDD implementation with:

IK bandwidth selection: Imbens-Kalyanaraman (2012) optimal bandwidth that balances bias and variance.
Local polynomial regression: Supports linear (order 1) and quadratic (order 2) local fits with kernel weighting.
McCrary density test: Tests for manipulation of the running variable at the cutoff.
Sharp vs. fuzzy RD: Sharp RD uses the cutoff directly. Fuzzy RD uses a two-stage approach when treatment is probabilistic at the cutoff (analogous to IV).

Financial use cases:

Effect of index inclusion on stock liquidity (market-cap threshold).
Impact of credit rating changes at rating boundaries.
Effect of SEC disclosure requirements at firm-size thresholds.
Impact of margin requirements at leverage cutoffs.

Parameters:

outcome (ndarray) – Outcome variable (1D array).
running_var (ndarray) – Running variable that determines treatment (1D array).
cutoff (float, default: 0.0) – Cutoff value. Default is 0.0.
bandwidth (float | None, default: None) – Bandwidth for local regression. If None, uses the IK optimal bandwidth. Default is None.
poly_order (int, default: 1) – Order of the local polynomial (1 = linear, 2 = quadratic). Default is 1.
fuzzy_treatment (ndarray | None, default: None) – Actual treatment indicator for fuzzy RD (1D array of 0s and 1s). If None, uses sharp RD. Default is None.
kernel (str, default: 'triangular') – Kernel for weighting observations: 'triangular', 'uniform', or 'epanechnikov'. Default is 'triangular'.
run_mccrary (bool, default: True) – Whether to run the McCrary manipulation test. Default is True.

Returns:

RD estimate with robust standard errors, bandwidth information, and diagnostic tests.

Return type:

RDRobustResult

Raises:

ValueError – If insufficient observations or invalid parameters.

Example

>>> import numpy as np
>>> rng = np.random.default_rng(42)
>>> n = 2000
>>> x = rng.uniform(-2, 2, n)
>>> y = 0.5 * x + 1.5 * (x >= 0) + rng.normal(0, 0.3, n)
>>> result = regression_discontinuity_robust(y, x, cutoff=0.0)
>>> abs(result.ate - 1.5) < 0.5
True
>>> result.bandwidth_method
'imbens_kalyanaraman'

References

Imbens, G. W. & Kalyanaraman, K. (2012). “Optimal Bandwidth Choice for the Regression Discontinuity Estimator.”
McCrary, J. (2008). “Manipulation of the Running Variable in the Regression Discontinuity Design.”
Cattaneo, M. D. et al. (2020). “A Practical Introduction to Regression Discontinuity Designs.”

bounds_analysis(outcome, treatment, selection=None, outcome_bounds=None, method='manski')[source]¶

Compute Manski or Lee bounds for partial identification of treatment effects.

When standard identifying assumptions (unconfoundedness, exclusion restriction) are too strong, partial identification provides bounds on the treatment effect that hold under weaker assumptions.

Manski (1990) bounds require only:

Known outcome support (bounded outcomes).
Random treatment assignment OR treatment is observed.

The treatment effect is bounded by:

\[E[Y|D=1] - y_{\max} \cdot P(D=0) - E[Y|D=0] \cdot P(D=0) \leq \text{ATE} \leq E[Y|D=1] - y_{\min} \cdot P(D=0) - E[Y|D=0] \cdot P(D=0)\]

Lee (2009) bounds handle sample selection (attrition):

When treatment affects whether the outcome is observed.
Tightens bounds by trimming the group with higher selection rate.

Financial use cases:

Bounding the effect of financial literacy programs on savings when participation is endogenous.
Bounding returns to education when the outcome (wages) is only observed for the employed.
Partial identification of trading strategy effects when some trades are censored.

Parameters:

outcome (ndarray) – Observed outcomes (1D array). For Lee bounds, should only contain observed (non-missing) values.
treatment (ndarray) – Binary treatment indicator (1D array of 0s and 1s).
selection (ndarray | None, default: None) – Binary selection indicator (1D array). Required for Lee bounds. 1 = outcome observed, 0 = outcome missing/attrited. Must be the same length as treatment. Default is None.
outcome_bounds (tuple[float, float] | None, default: None) – (y_min, y_max) bounds on the outcome support. Required for Manski bounds. If None, uses the observed min and max.
method (str, default: 'manski') – 'manski' for Manski worst-case bounds or 'lee' for Lee trimming bounds. Default is 'manski'.

Returns:

Lower and upper bounds on the treatment effect, whether the effect is sign-identified, and confidence intervals.

Return type:

BoundsResult

Example

>>> import numpy as np
>>> rng = np.random.default_rng(42)
>>> n = 1000
>>> T = rng.binomial(1, 0.5, n).astype(float)
>>> Y = 2.0 * T + rng.normal(0, 1, n)
>>> result = bounds_analysis(Y, T, outcome_bounds=(-5, 10))
>>> result.lower_bound < 2.0 < result.upper_bound
True
>>> result.identified  # both bounds positive -> sign identified
True

References

Manski, C. F. (1990). “Nonparametric Bounds on Treatment Effects.”
Lee, D. S. (2009). “Training, Wages, and Sample Selection: Estimating Sharp Bounds on Treatment Effects.”
Horowitz, J. L. & Manski, C. F. (2000). “Nonparametric Analysis of Randomized Experiments with Missing Covariate and Outcome Data.”

Integrations¶

External package wrappers for causal inference.

Functions in this module require the causal optional dependency group (DoWhy, EconML, DoubleML) and are guarded by @requires_extra('causal').

dowhy_causal_model(data, treatment, outcome, graph=None, common_causes=None, method='backdoor.propensity_score_matching')[source]¶

Build and estimate a causal model using DoWhy.

Parameters:

data (DataFrame) – Observational data.
treatment (str) – Name of the treatment column.
outcome (str) – Name of the outcome column.
graph (str | None, default: None) – Causal graph in GML or DOT format. If None, common_causes must be provided and DoWhy will construct a simple graph.
common_causes (list[str] | None, default: None) – List of common cause (confounder) column names. Required if graph is not provided.
method (str, default: 'backdoor.propensity_score_matching') – Estimation method name (e.g., ‘backdoor.propensity_score_matching’, ‘backdoor.linear_regression’, ‘iv.instrumental_variable’).

Returns:

estimate: float — estimated causal effect, p_value: float or None — p-value if available, method: str — method used, model: DoWhy CausalModel object, identified_estimand: the identified estimand, causal_estimate: the full DoWhy estimate object.

Return type:

dict[str, Any]

econml_dml(outcome, treatment, covariates, model_y=None, model_t=None, n_splits=3)[source]¶

Estimate causal effects using EconML’s LinearDML.

Parameters:

outcome (ndarray | Series) – Outcome variable.
treatment (ndarray | Series) – Treatment variable (can be continuous).
covariates (ndarray | DataFrame) – Covariate matrix.
model_y (Any, default: None) – Nuisance model for the outcome. Defaults to Lasso.
model_t (Any, default: None) – Nuisance model for the treatment. Defaults to Lasso.
n_splits (int, default: 3) – Number of cross-fitting splits.

Returns:

ate: float — average treatment effect, se: float — standard error, ci_lower: float — lower CI bound, ci_upper: float — upper CI bound, model: fitted LinearDML object.

Return type:

dict[str, Any]

econml_forest(outcome, treatment, covariates, n_estimators=100, min_samples_leaf=5)[source]¶

Estimate heterogeneous treatment effects using EconML’s CausalForestDML.

Parameters:

outcome (ndarray | Series) – Outcome variable.
treatment (ndarray | Series) – Treatment variable.
covariates (ndarray | DataFrame) – Covariate matrix.
n_estimators (int, default: 100) – Number of trees in the forest.
min_samples_leaf (int, default: 5) – Minimum number of samples per leaf.

Returns:

ate: float — average treatment effect, cate: np.ndarray — conditional ATE for each observation, se: float — standard error of ATE, model: fitted CausalForestDML object.

Return type:

dict[str, Any]

doubleml_plr(outcome, treatment, covariates, n_folds=5, ml_l=None, ml_m=None)[source]¶

Estimate treatment effect using DoubleML’s partially linear regression.

Parameters:

outcome (ndarray | Series) – Outcome variable.
treatment (ndarray | Series) – Treatment variable.
covariates (ndarray | DataFrame) – Covariate matrix.
n_folds (int, default: 5) – Number of cross-fitting folds.
ml_l (Any, default: None) – Nuisance learner for E[Y|X]. Defaults to Lasso.
ml_m (Any, default: None) – Nuisance learner for E[D|X]. Defaults to Lasso.

Returns:

ate: float — treatment effect estimate (theta), se: float — standard error, ci_lower: float — lower CI bound, ci_upper: float — upper CI bound, t_stat: float — t-statistic, p_value: float — p-value, model: fitted DoubleMLPLR object.

Return type:

dict[str, Any]

Causal Inference (wraquant.causal)¶

Quick Example¶

API Reference¶

Treatment Effects¶

Integrations¶

Causal Inference (`wraquant.causal`)¶