Market Microstructure (wraquant.microstructure)

Tools for analyzing market microstructure: liquidity measurement, order flow toxicity, and market quality metrics. These are essential for execution analysis, market-making strategy design, and understanding the microscopic structure of price formation.

Quick Example

from wraquant.microstructure import liquidity, toxicity

# Bid-ask spread estimation from trade data
spread = liquidity.effective_spread(trades)
print(f"Effective spread: {spread:.4f}")

# VPIN (Volume-Synchronized Probability of Informed Trading)
vpin = toxicity.vpin(volume_bars)
print(f"Current VPIN: {vpin.iloc[-1]:.4f}")
# High VPIN signals toxic order flow (informed trading)

See also

API Reference

Market microstructure analytics.

Provides quantitative measures of market quality, liquidity, and order flow toxicity for both high-frequency tick data and daily-frequency OHLCV data. These metrics are essential for execution cost analysis, informed-trading detection, and understanding the micro-level dynamics that drive price formation.

Key sub-modules:

  • Liquidity (liquidity) – Bid-ask spread estimation and liquidity measurement: amihud_illiquidity (the standard illiquidity proxy for daily data), roll_spread (implied spread from autocovariance), corwin_schultz_spread (high-low spread estimator), kyle_lambda (price impact coefficient), effective_spread, realized_spread, spread_decomposition, and liquidity_commonality.

  • Toxicity (toxicity) – Informed trading and order flow toxicity: vpin (Volume-Synchronized Probability of Informed Trading – the real-time toxicity metric that predicted the Flash Crash), pin_model (classical PIN model), adjusted_pin, order_flow_imbalance, bulk_volume_classification, trade_classification (Lee-Ready tick rule), and informed_trading_intensity.

  • Market Quality (market_quality) – Structural market efficiency: variance_ratio (tests random walk hypothesis), market_efficiency_ratio, hasbrouck_information_share, gonzalo_granger_component, intraday_volatility_pattern, price_impact_regression, depth, and resiliency.

Example

>>> from wraquant.microstructure import amihud_illiquidity, vpin
>>> illiq = amihud_illiquidity(returns, volume)
>>> toxicity = vpin(trades, volume_bucket_size=50_000)

Use wraquant.microstructure when analyzing execution quality, detecting informed trading, or building features for ML models that capture market structure. For execution algorithms that consume these metrics, see wraquant.execution. For generating microstructure-based ML features, see wraquant.ml.features.microstructure_features.

amihud_illiquidity(returns, volume, window=None)[source]

Amihud (2002) illiquidity ratio: mean of |return| / dollar volume.

A higher value indicates less liquid (more illiquid) markets.

Parameters:
  • returns (Series) – Asset return series.

  • volume (Series) – Dollar volume series (price * shares traded).

  • window (int | None, default: None) – Rolling window size. If None, returns a single scalar average over the entire sample.

Return type:

Series | float

Returns:

Rolling Amihud illiquidity ratio (or a single float when window is None).

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> returns = pd.Series(np.random.randn(252) * 0.01)
>>> volume = pd.Series(np.random.uniform(1e6, 5e6, 252))
>>> illiq = amihud_illiquidity(returns, volume)
>>> illiq > 0
True

See also

kyle_lambda: Price impact coefficient (regression-based alternative). amihud_rolling: Rolling version with normalization.

amihud_rolling(returns, volume, window=21, normalize=True)[source]

Rolling Amihud (2002) illiquidity ratio with proper normalization.

Computes the Amihud ratio over a rolling window and optionally normalizes by the cross-sectional or time-series mean so that values are comparable across different assets and time periods.

When to use: For tracking how an individual asset’s liquidity evolves over time. The normalization makes the measure comparable across assets with different price levels and trading volumes.

Interpretation: Higher values indicate less liquidity (more price impact per unit of trading volume). Sudden spikes often correspond to liquidity crises or market stress events. Typical values for large-cap US stocks are 1e-11 to 1e-9 (unnormalized).

Parameters:
  • returns (Series) – Asset return series.

  • volume (Series) – Dollar volume series (price * shares traded).

  • window (int, default: 21) – Rolling window size (default 21 for ~1 month of trading days).

  • normalize (bool, default: True) – If True, divide each rolling value by the full-sample mean so the time-series average is 1.0.

Return type:

Series

Returns:

Rolling Amihud illiquidity series.

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> returns = pd.Series(np.random.randn(100) * 0.01)
>>> volume = pd.Series(np.random.uniform(1e6, 5e6, 100))
>>> illiq = amihud_rolling(returns, volume, window=21)
>>> illiq.name
'amihud_rolling'

References

Amihud, Y. (2002). “Illiquidity and Stock Returns: Cross-Section and Time-Series Effects.” Journal of Financial Markets, 5(1), 31-56.

See also

amihud_illiquidity: Static (full-sample) Amihud ratio. liquidity_commonality: How much liquidity co-moves with the market.

closing_quoted_spread(bid_close, ask_close)[source]

Quoted bid-ask spread at the market close.

The closing spread is particularly relevant for investors who trade at or near the close (e.g., mutual fund NAV calculations, index rebalancing, MOC orders). It also serves as a simple daily liquidity proxy when intraday data is unavailable.

When to use: When analyzing execution costs for daily-frequency traders, evaluating end-of-day liquidity conditions, or constructing a daily spread time series from closing quote data.

Interpretation: Narrower spreads indicate better end-of-day liquidity. Spread widening at the close often precedes periods of higher volatility or information events (e.g., earnings releases).

Parameters:
  • bid_close (Series) – Best bid price at market close.

  • ask_close (Series) – Best ask price at market close.

Return type:

Series

Returns:

Closing quoted spread series (ask - bid), in price units.

Example

>>> import pandas as pd
>>> bid = pd.Series([99.90, 99.85, 99.95])
>>> ask = pd.Series([100.10, 100.15, 100.05])
>>> spread = closing_quoted_spread(bid, ask)
>>> float(spread.iloc[0])
0.2

References

Chordia, T., Roll, R. & Subrahmanyam, A. (2001). “Market Liquidity and Trading Activity.” Journal of Finance, 56(2), 501-530.

See also

effective_spread: Execution-weighted spread measure. relative_spread: Spread normalized by midpoint.

corwin_schultz_spread(high, low, window=1)[source]

Corwin & Schultz (2012) high-low spread estimator.

Estimates the effective bid-ask spread from consecutive daily high and low prices. The key insight is that daily high prices are almost always buyer-initiated (at the ask) while daily lows are seller-initiated (at the bid). The ratio of high-to-low therefore captures both volatility and the spread. By comparing single-day and two-day high-low ranges the method disentangles the two components.

When to use: When only daily OHLC data is available and you need a spread estimate. More robust than the Roll (1984) estimator because it does not require negative serial covariance and performs better in the presence of stale prices.

Interpretation: Output is in price units (same scale as the input). Values typically range from 0 (perfectly liquid) to several percent of price for illiquid stocks. Negative estimates are floored at zero (model assumption violated, usually when volatility overwhelms spread).

Parameters:
  • high (Series) – Daily high prices.

  • low (Series) – Daily low prices.

  • window (int, default: 1) – Averaging window for the spread estimate. window=1 returns the raw daily estimate.

Return type:

Series

Returns:

Estimated bid-ask spread series, floored at zero.

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> close = pd.Series(100 + np.cumsum(np.random.randn(100) * 0.5))
>>> high = close + np.abs(np.random.randn(100)) * 0.3
>>> low = close - np.abs(np.random.randn(100)) * 0.3
>>> spread = corwin_schultz_spread(high, low)
>>> (spread >= 0).all()
True

References

Corwin, S. A. & Schultz, P. (2012). “A Simple Way to Estimate Bid-Ask Spreads from Daily High and Low Prices.” Journal of Finance, 67(2), 719-760.

See also

roll_spread: Implied spread from trade prices only. effective_spread: Direct spread from trade and quote data.

depth_imbalance(bid_depth, ask_depth)[source]

Order book depth imbalance.

Computes (bid_depth - ask_depth) / (bid_depth + ask_depth) to measure the directional imbalance in resting limit order volume.

When to use: For real-time assessment of supply-demand imbalance in the limit order book. Commonly used as a short-horizon return predictor in high-frequency strategies.

Interpretation:

  • +1: All depth is on the bid side (strong buying interest, bullish signal).

  • -1: All depth is on the ask side (strong selling interest, bearish signal).

  • 0: Balanced book.

Values persistently above +0.3 or below -0.3 often indicate directional pressure that leads to price movement in the direction of the deeper side.

Parameters:
Return type:

Series | ndarray[tuple[Any, ...], dtype[floating]]

Returns:

Depth imbalance in [-1, 1].

Example

>>> import pandas as pd
>>> bid_depth = pd.Series([5000, 3000, 4000])
>>> ask_depth = pd.Series([3000, 5000, 4000])
>>> imb = depth_imbalance(bid_depth, ask_depth)
>>> float(imb.iloc[0])  # more bids than asks -> positive
0.25

References

Cao, C., Hansch, O. & Wang, X. (2009). “The Information Content of an Open Limit-Order Book.” Journal of Futures Markets, 29(1), 16-41.

See also

wraquant.microstructure.toxicity.order_flow_imbalance:

Volume-based imbalance measure.

wraquant.microstructure.market_quality.depth: Total market depth.

effective_spread(trade_prices, midpoints)[source]

Effective bid-ask spread: 2 * |trade_price - midpoint|.

Parameters:
Return type:

Series | ndarray[tuple[Any, ...], dtype[floating]]

Returns:

Per-trade effective spread, same type as the inputs.

Example

>>> import pandas as pd, numpy as np
>>> trades = pd.Series([100.05, 99.95, 100.03])
>>> mids = pd.Series([100.0, 100.0, 100.0])
>>> spreads = effective_spread(trades, mids)
>>> float(spreads.iloc[0])
0.1

See also

realized_spread: Post-trade spread (adverse selection component). roll_spread: Implied spread from price autocovariance.

kyle_lambda(prices, volume, window=20)[source]

Kyle’s lambda – price impact coefficient via rolling OLS.

Regresses price changes on signed order flow (volume) to estimate the permanent price impact per unit of volume.

Parameters:
  • prices (Series) – Price series.

  • volume (Series) – Signed volume series (positive for buys, negative for sells).

  • window (int, default: 20) – Rolling regression window.

Return type:

Series

Returns:

Rolling Kyle’s lambda series. Higher values indicate more price impact per unit of volume (less liquid).

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> prices = pd.Series(100 + np.cumsum(np.random.randn(100) * 0.5))
>>> volume = pd.Series(np.random.randn(100) * 1000)
>>> lam = kyle_lambda(prices, volume, window=20)
>>> len(lam) == 100
True

See also

amihud_illiquidity: Simpler illiquidity proxy (no signed volume needed). lambda_kyle_rolling: Kyle’s lambda with confidence intervals.

lambda_kyle_rolling(prices, volume, window=20)[source]

Rolling Kyle’s lambda with confidence intervals.

Extends kyle_lambda() by computing standard errors from the rolling OLS regression, yielding point estimates along with 95% confidence bounds. This is essential for determining whether the estimated price impact is statistically significant at each point in time.

When to use: When you need not just the level of price impact but also its precision. Useful for detecting regime changes in market liquidity – a significant widening of the confidence interval suggests structural uncertainty about the price impact coefficient.

Interpretation: A positive lambda indicates that buy-initiated volume pushes prices up (and sell-initiated pushes down), consistent with the Kyle (1985) model. Lambda values close to zero (or with confidence intervals spanning zero) suggest limited permanent price impact, i.e., a liquid market.

Parameters:
  • prices (Series) – Price series.

  • volume (Series) – Signed volume series (positive for buys, negative for sells).

  • window (int, default: 20) – Rolling regression window (must be >= 5).

Return type:

DataFrame

Returns:

DataFrame with columns 'lambda', 'std_err', 'ci_lower', 'ci_upper' (95% confidence interval).

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> prices = pd.Series(100 + np.cumsum(np.random.randn(50) * 0.1))
>>> volume = pd.Series(np.random.randn(50) * 1000)
>>> result = lambda_kyle_rolling(prices, volume, window=20)
>>> list(result.columns)
['lambda', 'std_err', 'ci_lower', 'ci_upper']

References

Kyle, A. S. (1985). “Continuous Auctions and Insider Trading.” Econometrica, 53(6), 1315-1335.

See also

kyle_lambda: Simple point estimate without confidence intervals. amihud_rolling: Rolling Amihud illiquidity ratio.

liquidity_commonality(asset_illiquidity, market_illiquidity, window=60)[source]

Commonality in liquidity (Chordia, Roll & Subrahmanyam, 2000).

Measures how much an individual asset’s liquidity co-moves with market-wide liquidity. The commonality coefficient is estimated via rolling regressions of changes in the asset’s illiquidity measure on changes in the market-wide illiquidity measure.

When to use: For assessing systematic liquidity risk. Assets with high commonality become illiquid precisely when the entire market becomes illiquid – an undesirable property that investors demand a premium for bearing.

Interpretation: The output is the rolling R-squared from the regression. Higher values (closer to 1) indicate stronger co-movement with market liquidity. Values above 0.3 suggest meaningful systematic liquidity risk. Most large-cap stocks show commonality R-squared of 0.05-0.20.

Parameters:
  • asset_illiquidity (Series) – Individual asset’s illiquidity measure (e.g., Amihud ratio, effective spread) as a time series.

  • market_illiquidity (Series) – Market-wide illiquidity aggregate (e.g., equal-weighted average Amihud ratio across all stocks).

  • window (int, default: 60) – Rolling regression window (default 60 for ~3 months).

Return type:

Series

Returns:

Rolling R-squared of the commonality regression.

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> asset = pd.Series(np.random.randn(200).cumsum())
>>> market = pd.Series(np.random.randn(200).cumsum())
>>> r2 = liquidity_commonality(asset, market, window=60)
>>> r2.name
'liquidity_commonality'

References

Chordia, T., Roll, R. & Subrahmanyam, A. (2000). “Commonality in Liquidity.” Journal of Financial Economics, 56(1), 3-28.

See also

amihud_rolling: Generate the illiquidity input for this function.

price_impact(trade_prices, volume, direction)[source]

Permanent price impact per trade.

Computed as direction * (midpoint_{t+1} - midpoint_t) / volume, approximated here via successive trade prices.

Parameters:
  • trade_prices (Series) – Executed trade prices.

  • volume (Series) – Volume for each trade.

  • direction (Series) – Trade direction indicator (+1 buy, -1 sell).

Return type:

Series

Returns:

Per-trade permanent price impact series.

Example

>>> import pandas as pd, numpy as np
>>> trades = pd.Series([100.0, 100.05, 100.10, 100.08])
>>> vol = pd.Series([1000, 2000, 1500, 1800])
>>> direction = pd.Series([1, 1, -1, 1])
>>> impact = price_impact(trades, vol, direction)
>>> len(impact) == 4
True

See also

kyle_lambda: Aggregate price impact coefficient. wraquant.microstructure.market_quality.price_impact_regression:

Permanent vs. temporary impact decomposition.

realized_spread(trade_prices, midpoints, delay=5)[source]

Realized spread incorporating a post-trade midpoint delay.

Measures the revenue to the liquidity provider: 2 * direction * (trade_price - midpoint_{t+delay}).

Parameters:
  • trade_prices (Series) – Executed trade prices.

  • midpoints (Series) – Mid-quote series aligned to trades.

  • delay (int, default: 5) – Number of observations to shift the midpoint forward.

Return type:

Series

Returns:

Per-trade realized spread series (NaN for the last delay rows).

Example

>>> import pandas as pd, numpy as np
>>> trades = pd.Series([100.05, 99.95, 100.03, 100.01, 99.98])
>>> mids = pd.Series([100.0, 100.0, 100.0, 100.0, 100.0])
>>> rs = realized_spread(trades, mids, delay=2)
>>> len(rs) == 5
True

See also

effective_spread: Total execution cost (before adverse selection). spread_decomposition: Full Huang-Stoll decomposition.

roll_spread(prices)[source]

Roll (1984) implied bid-ask spread from serial covariance.

Estimates the effective spread from the negative first-order autocovariance of price changes: spread = 2 * sqrt(-cov).

Parameters:

prices (Series) – Price series.

Return type:

float

Returns:

Estimated implied spread. Returns NaN if the serial covariance is non-negative (model assumption violated).

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> # Simulate trade prices with bid-ask bounce
>>> mid = 100 + np.cumsum(np.random.randn(500) * 0.01)
>>> bounce = np.random.choice([-0.05, 0.05], size=500)
>>> prices = pd.Series(mid + bounce)
>>> spread = roll_spread(prices)
>>> spread > 0 or np.isnan(spread)  # positive spread or NaN
True

See also

effective_spread: Direct spread from trade and quote data. corwin_schultz_spread: High-low spread estimator (OHLC data).

spread_decomposition(trade_prices, bid, ask, direction, delay=5)[source]

Huang-Stoll (1997) three-way spread decomposition.

Decomposes the effective spread into three economically distinct components:

  1. Adverse selection: compensation for trading against informed traders who possess private information. This portion of the spread is a permanent price impact – the midpoint moves against the liquidity provider after the trade.

  2. Order processing: compensation for the mechanical costs of market-making (exchange fees, technology, labor).

  3. Inventory holding: compensation for the risk of holding an unbalanced inventory.

When to use: For understanding why spreads are wide. If adverse selection dominates, the market has significant information asymmetry. If order processing dominates, the market is structurally costly.

Interpretation:

  • Adverse selection fraction > 0.5 indicates a market dominated by informed trading (e.g., single-stock options, small-cap equities before earnings).

  • Order processing fraction > 0.5 indicates a market where mechanical costs dominate (e.g., bond markets, low-volatility large-cap equities).

  • Inventory fraction is typically the smallest component for equities but can be large for less liquid instruments.

Parameters:
  • trade_prices (Series) – Executed trade prices.

  • bid (Series) – Best bid prices at time of each trade.

  • ask (Series) – Best ask prices at time of each trade.

  • direction (Series) – Trade direction indicator (+1 buy, -1 sell).

  • delay (int, default: 5) – Number of observations to look ahead for measuring the permanent price impact (default 5).

Returns:

  • 'adverse_selection': fraction of the spread due to information asymmetry.

  • 'order_processing': fraction due to order handling costs.

  • 'inventory_holding': fraction due to inventory risk.

  • 'effective_spread_mean': average effective spread.

Return type:

dict[str, float]

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> n = 200
>>> mid = 100 + np.cumsum(np.random.randn(n) * 0.01)
>>> spread_half = 0.05
>>> bid = pd.Series(mid - spread_half)
>>> ask = pd.Series(mid + spread_half)
>>> direction = pd.Series(np.random.choice([1, -1], n))
>>> trades = pd.Series(np.where(direction > 0, ask, bid))
>>> result = spread_decomposition(trades, bid, ask, direction)
>>> 0 <= result['adverse_selection'] <= 1
True

References

Huang, R. D. & Stoll, H. R. (1997). “The Components of the Bid-Ask Spread: A General Approach.” Review of Financial Studies, 10(4), 995-1034.

See also

effective_spread: Total execution cost measure. realized_spread: Liquidity provider’s revenue component.

turnover_ratio(volume, shares_outstanding)[source]

Turnover ratio: volume / shares outstanding.

Parameters:
  • volume (Series) – Daily trading volume.

  • shares_outstanding (Series | float) – Total shares outstanding (scalar or series).

Return type:

Series

Returns:

Daily turnover ratio. Higher values indicate more active trading.

Example

>>> import pandas as pd
>>> volume = pd.Series([1e6, 1.5e6, 0.8e6])
>>> ratio = turnover_ratio(volume, shares_outstanding=100e6)
>>> float(ratio.iloc[0])
0.01

See also

amihud_illiquidity: Price-impact-based liquidity measure.

adjusted_pin(buy_trades, sell_trades)[source]

Adjusted Probability of Informed Trading (AdjPIN).

Extends the standard PIN model (Easley et al., 1996) by separating information-driven order flow from symmetric order-flow shocks that arise from liquidity effects. The Duarte & Young (2009) adjustment adds a regime where both buy and sell arrival rates increase simultaneously (a liquidity event), preventing the model from misattributing liquidity shocks as informed trading.

The standard PIN is known to overstate the probability of informed trading during periods of high overall activity (e.g., index rebalancing, option expiration). AdjPIN corrects for this bias.

When to use: When you need a cleaner measure of information asymmetry that is not contaminated by correlated liquidity events. Preferred over standard PIN for cross-sectional comparisons where stocks have heterogeneous trading activity.

Interpretation:

  • adj_pin < 0.10: Low information asymmetry, suitable for uninformed market-making.

  • adj_pin 0.10-0.30: Moderate; some informed activity detected.

  • adj_pin > 0.30: High information asymmetry; adverse selection risk is elevated.

Parameters:
Return type:

dict[str, float]

Returns:

Dictionary with keys 'adj_pin', 'pin_unadjusted', 'alpha', 'delta', 'mu', 'eps_b', 'eps_s', 'theta' (probability of symmetric activity shock).

Example

>>> import numpy as np
>>> from wraquant.microstructure.toxicity import adjusted_pin
>>> rng = np.random.default_rng(42)
>>> buys = rng.poisson(50, size=60)
>>> sells = rng.poisson(50, size=60)
>>> result = adjusted_pin(buys, sells)
>>> 0 <= result['adj_pin'] <= 1
True

References

Duarte, J. & Young, L. (2009). “Why is PIN Priced?” Journal of Financial Economics, 91(2), 119-138.

See also

pin_model: Standard (unadjusted) PIN estimation. vpin: Real-time alternative using volume buckets.

bulk_volume_classification(close, high, low, volume)[source]

Bulk Volume Classification (BVC).

Classifies aggregate volume into buy- and sell-initiated components using the position of the close price relative to the high-low range. This avoids the need for tick-by-tick data or quote data, making it practical for daily-frequency analysis.

The buy fraction is estimated as:

Z = (close - low) / (high - low)
buy_fraction = Z  (linear interpolation)

The CDF of a standard normal evaluated at Z is sometimes used, but the linear version performs comparably and is more transparent.

When to use: When you need to estimate buy/sell volume from daily OHLCV data without tick-by-tick trade records. Suitable as an input to VPIN calculations, order flow imbalance metrics, or any model requiring buy/sell volume decomposition.

Interpretation: When the close is near the high, most volume is classified as buys. When the close is near the low, most volume is classified as sells. The BVC estimate is noisier than Lee-Ready for individual trades but aggregates well across bars.

Parameters:
  • close (Series) – Closing prices.

  • high (Series) – High prices.

  • low (Series) – Low prices.

  • volume (Series) – Total volume per bar.

Return type:

DataFrame

Returns:

DataFrame with columns 'buy_volume', 'sell_volume', 'buy_fraction'.

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> n = 50
>>> close = pd.Series(100 + np.cumsum(np.random.randn(n) * 0.5))
>>> high = close + np.abs(np.random.randn(n))
>>> low = close - np.abs(np.random.randn(n))
>>> volume = pd.Series(np.random.uniform(1e4, 5e4, n))
>>> result = bulk_volume_classification(close, high, low, volume)
>>> list(result.columns)
['buy_volume', 'sell_volume', 'buy_fraction']

References

Easley, D., Lopez de Prado, M. M. & O’Hara, M. (2012). “Bulk Classification of Trading Activity.” Working Paper, Johnson School Research Paper Series.

See also

trade_classification: Lee-Ready classification from tick data. vpin: Uses buy/sell volume to compute informed trading probability.

information_share(prices_list)[source]

Hasbrouck’s information share across multiple venues.

Estimates each venue’s contribution to price discovery using the variance decomposition of a VECM residual. A simplified approach is used: the information share for venue j is proportional to 1 / var(price_j - mean_price).

Parameters:

prices_list (list[Series]) – List of price series from different venues, all sharing the same DatetimeIndex.

Return type:

ndarray[tuple[Any, ...], dtype[floating]]

Returns:

Array of information shares summing to 1, one per venue. A venue with a higher share contributes more to price discovery.

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> base = pd.Series(100 + np.cumsum(np.random.randn(200) * 0.1))
>>> venue_a = base + np.random.randn(200) * 0.01
>>> venue_b = base + np.random.randn(200) * 0.05
>>> shares = information_share([venue_a, venue_b])
>>> abs(shares.sum() - 1.0) < 1e-10
True

See also

wraquant.microstructure.market_quality.hasbrouck_information_share:

Cholesky-based information share with bounds.

wraquant.microstructure.market_quality.gonzalo_granger_component:

GG permanent-transitory decomposition.

informed_trading_intensity(buy_volume, sell_volume, window=20)[source]

Time-varying probability of informed trading using a sequential model.

Estimates the instantaneous probability that the marginal trade is information-driven, based on the sequential trade framework of Glosten & Milgrom (1985) and Easley & O’Hara (1987).

The approach uses a rolling Bayesian update. In each window, the fraction of trades on the “aggressive” side (whichever of buy or sell dominates) is used to infer the conditional probability that an information event is occurring, under the assumption that informed traders cluster on one side while uninformed traders arrive symmetrically.

When to use: For real-time monitoring of informed trading intensity. Unlike the static PIN model, this provides a time-varying signal that can be used to dynamically adjust quotes or execution strategies.

Interpretation: Values near 0.5 indicate balanced (uninformed) flow. Values approaching 1.0 indicate that nearly all marginal volume is on one side, consistent with an active informed trader.

Parameters:
  • buy_volume (Series) – Buy-initiated volume per observation.

  • sell_volume (Series) – Sell-initiated volume per observation.

  • window (int, default: 20) – Rolling window for the Bayesian update.

Return type:

Series

Returns:

Rolling informed trading intensity in [0, 1].

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> buys = pd.Series(np.random.uniform(100, 500, 50))
>>> sells = pd.Series(np.random.uniform(100, 500, 50))
>>> intensity = informed_trading_intensity(buys, sells, window=10)
>>> intensity.name
'informed_trading_intensity'

References

Easley, D. & O’Hara, M. (1987). “Price, Trade Size, and Information in Securities Markets.” Journal of Financial Economics, 19(1), 69-90.

See also

pin_model: Static probability of informed trading. order_flow_imbalance: Simpler directional flow measure.

order_flow_imbalance(buy_volume, sell_volume, window=20)[source]

Rolling order flow imbalance (OFI).

Defined as (buy_volume - sell_volume) / (buy_volume + sell_volume) averaged over a rolling window.

Parameters:
  • buy_volume (Series) – Buy-initiated volume per period.

  • sell_volume (Series) – Sell-initiated volume per period.

  • window (int, default: 20) – Rolling window size.

Return type:

Series

Returns:

Rolling OFI series in [-1, 1]. Values near +1 indicate strong buying pressure; near -1 indicates selling pressure.

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> buys = pd.Series(np.random.uniform(100, 500, 50))
>>> sells = pd.Series(np.random.uniform(100, 500, 50))
>>> ofi = order_flow_imbalance(buys, sells, window=10)
>>> len(ofi) == 50
True

See also

vpin: Volume-synchronized informed trading probability. wraquant.microstructure.liquidity.depth_imbalance:

Order book depth imbalance.

pin_model(buy_trades, sell_trades)[source]

Estimate the Probability of Informed Trading (PIN).

Uses maximum likelihood on the Easley-Kiefer-O’Hara-Paperman model. The PIN is defined as alpha * mu / (alpha * mu + eps_b + eps_s).

Parameters:
Return type:

dict[str, float]

Returns:

Dictionary with keys 'pin', 'alpha', 'delta', 'mu', 'eps_b', 'eps_s'. PIN values above 0.20 indicate significant informed trading.

Example

>>> import numpy as np
>>> from wraquant.microstructure.toxicity import pin_model
>>> rng = np.random.default_rng(42)
>>> buys = rng.poisson(50, size=60)
>>> sells = rng.poisson(50, size=60)
>>> result = pin_model(buys, sells)
>>> 0 <= result['pin'] <= 1
True

See also

adjusted_pin: PIN corrected for symmetric liquidity shocks. vpin: Volume-synchronized alternative (real-time).

toxicity_index(vpin_values, ofi_values, spread_values, weights=(0.5, 0.3, 0.2))[source]

Composite toxicity index combining VPIN, OFI, and spread dynamics.

Produces a single 0-100 score summarizing the overall level of adverse selection / order flow toxicity. Each input component is independently normalized to [0, 1] via min-max scaling, then combined with the specified weights and rescaled to 0-100.

When to use: For a single-number dashboard indicator of market toxicity that synthesizes multiple underlying signals. Useful for setting risk limits (e.g., “pause quoting when toxicity > 70”) or for cross-sectional comparison across instruments.

Interpretation:

  • 0-20: Low toxicity; market is safe for passive market-making.

  • 20-50: Moderate toxicity; monitor order flow closely.

  • 50-80: Elevated toxicity; consider widening quotes or reducing size.

  • 80-100: Extreme toxicity; likely informed trading event in progress.

Parameters:
Return type:

ndarray[tuple[Any, ...], dtype[floating]]

Returns:

Composite toxicity index in [0, 100].

Example

>>> import numpy as np
>>> from wraquant.microstructure.toxicity import toxicity_index
>>> vpin_vals = np.array([0.2, 0.3, 0.5, 0.4])
>>> ofi_vals = np.array([0.1, -0.3, 0.5, -0.2])
>>> spread_vals = np.array([0.01, 0.02, 0.03, 0.015])
>>> idx = toxicity_index(vpin_vals, ofi_vals, spread_vals)
>>> (idx >= 0).all() and (idx <= 100).all()
True

References

Easley, D., Lopez de Prado, M. M. & O’Hara, M. (2011). “The Microstructure of the ‘Flash Crash’.” Journal of Portfolio Management, 37(2), 118-128.

See also

vpin: One of the component inputs. order_flow_imbalance: Another component input.

trade_classification(trade_prices, bid, ask)[source]

Lee-Ready trade classification combining quote test and tick test.

Classifies each trade as buyer-initiated (+1) or seller-initiated (-1).

The quote test assigns direction based on whether the trade price is above or below the midpoint. When the trade is at the midpoint, the tick test (based on successive price changes) is used as a fallback.

Parameters:
  • trade_prices (Series) – Executed trade prices.

  • bid (Series) – Best bid prices at time of each trade.

  • ask (Series) – Best ask prices at time of each trade.

Return type:

Series

Returns:

Classification series with values +1 (buy) or -1 (sell).

Example

>>> import pandas as pd
>>> trades = pd.Series([100.05, 99.95, 100.00, 100.03])
>>> bid = pd.Series([99.98, 99.93, 99.98, 100.00])
>>> ask = pd.Series([100.02, 99.97, 100.02, 100.06])
>>> direction = trade_classification(trades, bid, ask)
>>> int(direction.iloc[0])  # trade above midpoint -> buy
1

See also

bulk_volume_classification: Classify from OHLCV data (no quotes). order_flow_imbalance: Aggregate classified volume into a signal.

vpin(volume, buy_volume, n_buckets=50)[source]

Volume-Synchronized Probability of Informed Trading (VPIN).

Aggregates volume into equal-sized buckets and measures the absolute order imbalance in each bucket.

Parameters:
Return type:

ndarray[tuple[Any, ...], dtype[floating]]

Returns:

VPIN values, one per bucket. Higher values indicate more informed trading activity. Values above 0.4 are typically elevated.

Example

>>> import numpy as np
>>> from wraquant.microstructure.toxicity import vpin
>>> volume = np.array([1000, 1500, 800, 1200, 900])
>>> buy_vol = np.array([600, 400, 500, 800, 300])
>>> result = vpin(volume, buy_vol, n_buckets=2)
>>> len(result) >= 1
True

See also

pin_model: Static PIN estimation from daily trade counts. toxicity_index: Composite toxicity score combining VPIN and OFI. bulk_volume_classification: Estimate buy/sell volume from OHLCV.

depth(bid_volume, ask_volume, levels=5)[source]

Market depth: total volume available at the top N price levels.

Parameters:
Return type:

Series | ndarray[tuple[Any, ...], dtype[floating]]

Returns:

Total depth (bid + ask) summed across the requested levels.

Example

>>> import numpy as np
>>> bid_vol = np.array([1000, 800, 500, 300, 200])
>>> ask_vol = np.array([900, 700, 600, 400, 100])
>>> depth(bid_vol, ask_vol, levels=3)
4500.0

See also

wraquant.microstructure.liquidity.depth_imbalance:

Directional imbalance between bid and ask depth.

gonzalo_granger_component(prices_list)[source]

Gonzalo-Granger (1995) permanent-transitory decomposition.

Decomposes cointegrated price series into a permanent (efficient price) component and transitory (pricing error) component. The permanent component weights reveal each venue’s contribution to the long-run efficient price.

Unlike Hasbrouck information shares, the GG component shares are unique (not dependent on Cholesky ordering).

When to use: As a complement to Hasbrouck information shares for price discovery analysis. Particularly useful when you need a unique (not bounded) measure of each venue’s price discovery contribution.

Interpretation:

  • GG weights sum to 1.0 across venues.

  • A venue with a large GG weight drives the permanent price – its price innovations are absorbed by the market as a whole.

  • A venue with a small GG weight primarily reflects transitory noise.

Parameters:

prices_list (list[Series]) – List of cointegrated price series from different venues, sharing the same DatetimeIndex.

Returns:

  • 'gg_weights': Gonzalo-Granger permanent component weights (one per venue, summing to 1).

  • 'alpha': Error-correction coefficients for each venue.

Return type:

dict[str, ndarray[tuple[Any, ...], dtype[floating]]]

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> base = pd.Series(100 + np.cumsum(np.random.randn(200) * 0.1))
>>> venue_a = base + np.random.randn(200) * 0.01
>>> venue_b = base + np.random.randn(200) * 0.05
>>> result = gonzalo_granger_component([venue_a, venue_b])
>>> abs(result['gg_weights'].sum() - 1.0) < 1e-10
True

References

Gonzalo, J. & Granger, C. (1995). “Estimation of Common Long- Memory Components in Cointegrated Systems.” Journal of Business & Economic Statistics, 13(1), 27-35.

See also

hasbrouck_information_share: Cholesky-based information share.

hasbrouck_information_share(prices_list)[source]

Hasbrouck (1995) information share for price discovery analysis.

Measures each venue’s (or instrument’s) contribution to the efficient price innovation. The information share for venue j is the fraction of the total variance of the efficient price innovation attributable to that venue.

The method is based on a Vector Error Correction Model (VECM). When the innovation covariance matrix is non-diagonal, upper and lower bounds are computed via Cholesky factorization with different orderings. The midpoint of these bounds is reported as the point estimate.

When to use: For analyzing where price discovery occurs across multiple venues (e.g., NYSE vs NASDAQ, futures vs spot, ADR vs local listing). Essential for regulatory analysis and optimal execution venue selection.

Interpretation:

  • Information shares sum to 1.0 across venues.

  • A venue with information share > 0.5 in a two-venue system is the dominant price discovery venue.

  • If the upper and lower bounds are far apart, the venues have highly correlated innovations and the attribution is ambiguous.

Parameters:

prices_list (list[Series]) – List of price series from different venues, all sharing the same DatetimeIndex. Must contain at least 2 venues.

Returns:

  • 'midpoint': Midpoint information shares (best point estimate).

  • 'upper': Upper-bound information shares.

  • 'lower': Lower-bound information shares.

Return type:

dict[str, ndarray[tuple[Any, ...], dtype[floating]]]

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> base = pd.Series(100 + np.cumsum(np.random.randn(200) * 0.1))
>>> venue_a = base + np.random.randn(200) * 0.01
>>> venue_b = base + np.random.randn(200) * 0.05
>>> result = hasbrouck_information_share([venue_a, venue_b])
>>> abs(result['midpoint'].sum() - 1.0) < 0.01
True

References

Hasbrouck, J. (1995). “One Security, Many Markets: Determining the Contributions to Price Discovery.” Journal of Finance, 50(4), 1175-1199.

See also

gonzalo_granger_component: Unique (non-bounded) price discovery measure. wraquant.microstructure.toxicity.information_share:

Simplified variance-based information share.

intraday_volatility_pattern(prices, freq='h')[source]

Estimate the intraday volatility pattern (U-shape or J-shape).

Computes the average absolute return at each intraday time bucket (hourly by default), revealing the well-documented U-shaped pattern where volatility is highest at the open and close and lowest at midday.

When to use: For understanding the diurnal volatility cycle of a market. Essential for:

  • Optimal execution: schedule trades during low-volatility periods.

  • Risk management: adjust intraday VaR for time-of-day effects.

  • Market-making: widen quotes during high-volatility open/close.

Interpretation: The output is indexed by time-of-day (e.g., hour). Peaks at the open and close indicate information-driven volatility (overnight information absorption and closing auctions). A flat profile suggests a market dominated by algorithmic flow with little information asymmetry.

Parameters:
  • prices (Series) – Intraday price series with a DatetimeIndex.

  • freq (str, default: 'h') – Resampling frequency for the volatility buckets. Use 'h' for hourly, '30min' for half-hourly, '15min' for 15-minute buckets.

Return type:

Series

Returns:

Average absolute return by time-of-day bucket, indexed by the bucket label (e.g., hour of day).

Example

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range('2024-01-02 09:30', periods=78, freq='5min')
>>> prices = pd.Series(100 + np.cumsum(np.random.randn(78) * 0.1),
...                     index=idx)
>>> pattern = intraday_volatility_pattern(prices, freq='h')
>>> len(pattern) > 0
True

References

Wood, R. A., McInish, T. H. & Ord, J. K. (1985). “An Investigation of Transactions Data for NYSE Stocks.” Journal of Finance, 40(3), 723-739.

Admati, A. R. & Pfleiderer, P. (1988). “A Theory of Intraday Patterns: Volume and Price Variability.” Review of Financial Studies, 1(1), 3-40.

See also

variance_ratio: Variance ratio test for random walk. market_efficiency_ratio: Multi-lag efficiency assessment.

market_efficiency_ratio(prices, lags=None)[source]

Market efficiency ratio based on variance ratio analysis.

Adapts the Lo-MacKinlay variance ratio test for market quality assessment by computing the ratio at multiple lags and summarizing the degree of departure from efficient pricing.

Under an efficient market (random walk), the variance of k-period returns equals k times the variance of 1-period returns (VR = 1). Departures indicate:

  • VR > 1: Positive autocorrelation (momentum, trending).

  • VR < 1: Negative autocorrelation (mean reversion, microstructure noise).

When to use: For assessing how efficiently a market incorporates information. Useful for comparing market quality across instruments, venues, or time periods.

Interpretation: The efficiency_score is the average absolute deviation of variance ratios from 1.0. Lower is more efficient:

  • < 0.05: Highly efficient market.

  • 0.05-0.15: Moderately efficient.

  • > 0.15: Significant inefficiency (either microstructure noise or predictable patterns).

Parameters:
  • prices (Series) – Price series (levels, not returns).

  • lags (list[int] | None, default: None) – List of return horizons to test (default [2, 5, 10, 20]).

Returns:

  • 'efficiency_score': Average |VR - 1| across lags (lower is more efficient).

  • 'variance_ratios': Dict mapping each lag to its VR value.

Return type:

dict[str, float | dict[int, float]]

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> prices = pd.Series(100 * np.exp(np.cumsum(np.random.randn(500) * 0.01)))
>>> result = market_efficiency_ratio(prices)
>>> result['efficiency_score'] >= 0
True

References

Lo, A. W. & MacKinlay, A. C. (1988). “Stock Market Prices Do Not Follow Random Walks.” Review of Financial Studies, 1(1), 41-66.

See also

variance_ratio: Single-lag variance ratio test with p-value. resiliency: Spread-based market quality measure.

price_impact_regression(price_changes, signed_volume, lags=5)[source]

Price impact regression decomposing permanent and temporary effects.

Regresses price changes on contemporaneous and lagged signed order flow to estimate:

  • Permanent impact: the long-run effect of a unit of signed volume on prices (information content).

  • Temporary impact: the transient effect that reverses over time (liquidity provision revenue).

The regression model is:

dp_t = c + beta_0 * v_t + beta_1 * v_{t-1} + ... + beta_k * v_{t-k} + eps_t

Permanent impact is the sum of all beta coefficients. Temporary impact is beta_0 - permanent_impact.

When to use: For analyzing the dynamic price impact of trading activity. Essential for optimal execution and transaction cost analysis.

Interpretation:

  • Positive permanent impact: trades convey information; the market adjusts permanently.

  • Negative temporary impact: trades cause a transient price displacement that reverses (liquidity provider’s profit).

  • R-squared: fraction of price variation explained by order flow; higher values indicate more order-flow-driven pricing.

Parameters:
  • price_changes (Series) – Price change (delta-p) series.

  • signed_volume (Series) – Signed order flow (positive = buys, negative = sells).

  • lags (int, default: 5) – Number of lagged order flow terms to include (default 5).

Returns:

  • 'permanent_impact': Sum of all order flow coefficients.

  • 'temporary_impact': beta_0 - permanent_impact.

  • 'beta_0': Contemporaneous impact coefficient.

  • 'r_squared': R-squared of the regression.

Return type:

dict[str, float]

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> dp = pd.Series(np.random.randn(200) * 0.01)
>>> sv = pd.Series(np.random.randn(200) * 1000)
>>> result = price_impact_regression(dp, sv, lags=3)
>>> 'permanent_impact' in result and 'r_squared' in result
True

References

Hasbrouck, J. (1991). “Measuring the Information Content of Stock Trades.” Journal of Finance, 46(1), 179-207.

See also

wraquant.microstructure.liquidity.kyle_lambda:

Simpler single-coefficient price impact.

wraquant.microstructure.liquidity.spread_decomposition:

Spread-based adverse selection decomposition.

quoted_spread(bid, ask)[source]

Quoted bid-ask spread: ask - bid.

Parameters:
Return type:

Series | ndarray[tuple[Any, ...], dtype[floating]]

Returns:

Absolute quoted spread.

Example

>>> import numpy as np
>>> bid = np.array([99.90, 99.85])
>>> ask = np.array([100.10, 100.15])
>>> quoted_spread(bid, ask)
array([0.2, 0.3])

See also

relative_spread: Spread normalized by midpoint. wraquant.microstructure.liquidity.effective_spread:

Execution-weighted spread.

relative_spread(bid, ask)[source]

Relative spread: (ask - bid) / midpoint.

Parameters:
Return type:

Series | ndarray[tuple[Any, ...], dtype[floating]]

Returns:

Relative spread as a fraction of the midpoint. Typical values are 0.001-0.01 for liquid large-cap stocks.

Example

>>> import pandas as pd
>>> bid = pd.Series([99.90, 99.85])
>>> ask = pd.Series([100.10, 100.15])
>>> rs = relative_spread(bid, ask)
>>> float(rs.iloc[0])  # 0.20 / 100.0 = 0.002
0.002

See also

quoted_spread: Absolute spread in price units.

resiliency(spreads, window=20)[source]

Spread resiliency: how quickly the spread recovers after a shock.

Measured as the negative autocorrelation of spread changes. A higher value indicates a more resilient market (spreads revert faster).

Parameters:
  • spreads (Series) – Time series of quoted or effective spreads.

  • window (int, default: 20) – Rolling window for estimating autocorrelation of spread changes.

Return type:

Series

Returns:

Rolling resiliency measure. Higher values indicate faster spread recovery (more resilient market).

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> spreads = pd.Series(0.05 + np.random.randn(100) * 0.01)
>>> res = resiliency(spreads, window=20)
>>> res.name
'resiliency'

See also

quoted_spread: Generate the spread input for this function. variance_ratio: Random walk efficiency test.

variance_ratio(prices, short_period=2, long_period=10)[source]

Lo-MacKinlay (1988) variance ratio test.

Tests the random walk hypothesis by comparing the variance of long_period returns to short_period returns, scaled appropriately. Under a random walk, the ratio equals 1.

Parameters:
  • prices (Series) – Price series (levels, not returns).

  • short_period (int, default: 2) – Short return horizon (default 2).

  • long_period (int, default: 10) – Long return horizon (must be a multiple of short_period for a clean comparison, but this is not enforced).

Returns:

  • 'vr': Variance ratio.

  • 'z_stat': Asymptotic z-statistic under IID assumption.

  • 'p_value': Two-sided p-value.

Return type:

dict[str, float]

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> prices = pd.Series(100 * np.exp(np.cumsum(np.random.randn(500) * 0.01)))
>>> result = variance_ratio(prices, short_period=2, long_period=10)
>>> 'vr' in result and 'p_value' in result
True

See also

market_efficiency_ratio: Multi-lag efficiency summary. resiliency: Spread recovery speed.

Liquidity

Liquidity analytics for market microstructure.

Liquidity measures how easily an asset can be traded without significantly moving its price. Illiquid assets carry a liquidity risk premium and pose execution challenges. This module provides the standard toolkit for measuring liquidity from trade and quote data.

Measures provided:

Illiquidity / price impact:
  • amihud_illiquidity: the Amihud (2002) ratio – average daily |return| / volume. Higher values indicate less liquid assets. The most widely used cross-sectional liquidity proxy because it only requires daily data.

  • kyle_lambda: Kyle’s lambda – the permanent price impact coefficient estimated via rolling OLS of price changes on signed order flow. Higher lambda = more price impact per unit of volume.

  • price_impact: per-trade permanent price impact.

Spread estimators:
  • roll_spread: Roll (1984) implied spread from serial autocovariance of price changes. Requires only trade prices (no quote data needed).

  • effective_spread: 2 * |trade_price - midpoint|. The standard measure of execution cost.

  • realized_spread: spread earned by the liquidity provider after a delay, capturing adverse selection.

Activity:
  • turnover_ratio: daily volume / shares outstanding. Measures trading activity relative to float.

How to choose:
  • Cross-sectional liquidity ranking (daily data only): use amihud_illiquidity.

  • Execution cost analysis (trade + quote data): use effective_spread and realized_spread.

  • Price impact modeling: use kyle_lambda for permanent impact; price_impact for per-trade measurement.

  • No quote data available: use roll_spread as a proxy for the bid-ask spread.

References

  • Amihud (2002), “Illiquidity and Stock Returns”

  • Kyle (1985), “Continuous Auctions and Insider Trading”

  • Roll (1984), “A Simple Implicit Measure of the Effective Bid-Ask Spread”

amihud_illiquidity(returns, volume, window=None)[source]

Amihud (2002) illiquidity ratio: mean of |return| / dollar volume.

A higher value indicates less liquid (more illiquid) markets.

Parameters:
  • returns (Series) – Asset return series.

  • volume (Series) – Dollar volume series (price * shares traded).

  • window (int | None, default: None) – Rolling window size. If None, returns a single scalar average over the entire sample.

Return type:

Series | float

Returns:

Rolling Amihud illiquidity ratio (or a single float when window is None).

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> returns = pd.Series(np.random.randn(252) * 0.01)
>>> volume = pd.Series(np.random.uniform(1e6, 5e6, 252))
>>> illiq = amihud_illiquidity(returns, volume)
>>> illiq > 0
True

See also

kyle_lambda: Price impact coefficient (regression-based alternative). amihud_rolling: Rolling version with normalization.

kyle_lambda(prices, volume, window=20)[source]

Kyle’s lambda – price impact coefficient via rolling OLS.

Regresses price changes on signed order flow (volume) to estimate the permanent price impact per unit of volume.

Parameters:
  • prices (Series) – Price series.

  • volume (Series) – Signed volume series (positive for buys, negative for sells).

  • window (int, default: 20) – Rolling regression window.

Return type:

Series

Returns:

Rolling Kyle’s lambda series. Higher values indicate more price impact per unit of volume (less liquid).

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> prices = pd.Series(100 + np.cumsum(np.random.randn(100) * 0.5))
>>> volume = pd.Series(np.random.randn(100) * 1000)
>>> lam = kyle_lambda(prices, volume, window=20)
>>> len(lam) == 100
True

See also

amihud_illiquidity: Simpler illiquidity proxy (no signed volume needed). lambda_kyle_rolling: Kyle’s lambda with confidence intervals.

roll_spread(prices)[source]

Roll (1984) implied bid-ask spread from serial covariance.

Estimates the effective spread from the negative first-order autocovariance of price changes: spread = 2 * sqrt(-cov).

Parameters:

prices (Series) – Price series.

Return type:

float

Returns:

Estimated implied spread. Returns NaN if the serial covariance is non-negative (model assumption violated).

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> # Simulate trade prices with bid-ask bounce
>>> mid = 100 + np.cumsum(np.random.randn(500) * 0.01)
>>> bounce = np.random.choice([-0.05, 0.05], size=500)
>>> prices = pd.Series(mid + bounce)
>>> spread = roll_spread(prices)
>>> spread > 0 or np.isnan(spread)  # positive spread or NaN
True

See also

effective_spread: Direct spread from trade and quote data. corwin_schultz_spread: High-low spread estimator (OHLC data).

effective_spread(trade_prices, midpoints)[source]

Effective bid-ask spread: 2 * |trade_price - midpoint|.

Parameters:
Return type:

Series | ndarray[tuple[Any, ...], dtype[floating]]

Returns:

Per-trade effective spread, same type as the inputs.

Example

>>> import pandas as pd, numpy as np
>>> trades = pd.Series([100.05, 99.95, 100.03])
>>> mids = pd.Series([100.0, 100.0, 100.0])
>>> spreads = effective_spread(trades, mids)
>>> float(spreads.iloc[0])
0.1

See also

realized_spread: Post-trade spread (adverse selection component). roll_spread: Implied spread from price autocovariance.

realized_spread(trade_prices, midpoints, delay=5)[source]

Realized spread incorporating a post-trade midpoint delay.

Measures the revenue to the liquidity provider: 2 * direction * (trade_price - midpoint_{t+delay}).

Parameters:
  • trade_prices (Series) – Executed trade prices.

  • midpoints (Series) – Mid-quote series aligned to trades.

  • delay (int, default: 5) – Number of observations to shift the midpoint forward.

Return type:

Series

Returns:

Per-trade realized spread series (NaN for the last delay rows).

Example

>>> import pandas as pd, numpy as np
>>> trades = pd.Series([100.05, 99.95, 100.03, 100.01, 99.98])
>>> mids = pd.Series([100.0, 100.0, 100.0, 100.0, 100.0])
>>> rs = realized_spread(trades, mids, delay=2)
>>> len(rs) == 5
True

See also

effective_spread: Total execution cost (before adverse selection). spread_decomposition: Full Huang-Stoll decomposition.

price_impact(trade_prices, volume, direction)[source]

Permanent price impact per trade.

Computed as direction * (midpoint_{t+1} - midpoint_t) / volume, approximated here via successive trade prices.

Parameters:
  • trade_prices (Series) – Executed trade prices.

  • volume (Series) – Volume for each trade.

  • direction (Series) – Trade direction indicator (+1 buy, -1 sell).

Return type:

Series

Returns:

Per-trade permanent price impact series.

Example

>>> import pandas as pd, numpy as np
>>> trades = pd.Series([100.0, 100.05, 100.10, 100.08])
>>> vol = pd.Series([1000, 2000, 1500, 1800])
>>> direction = pd.Series([1, 1, -1, 1])
>>> impact = price_impact(trades, vol, direction)
>>> len(impact) == 4
True

See also

kyle_lambda: Aggregate price impact coefficient. wraquant.microstructure.market_quality.price_impact_regression:

Permanent vs. temporary impact decomposition.

turnover_ratio(volume, shares_outstanding)[source]

Turnover ratio: volume / shares outstanding.

Parameters:
  • volume (Series) – Daily trading volume.

  • shares_outstanding (Series | float) – Total shares outstanding (scalar or series).

Return type:

Series

Returns:

Daily turnover ratio. Higher values indicate more active trading.

Example

>>> import pandas as pd
>>> volume = pd.Series([1e6, 1.5e6, 0.8e6])
>>> ratio = turnover_ratio(volume, shares_outstanding=100e6)
>>> float(ratio.iloc[0])
0.01

See also

amihud_illiquidity: Price-impact-based liquidity measure.

corwin_schultz_spread(high, low, window=1)[source]

Corwin & Schultz (2012) high-low spread estimator.

Estimates the effective bid-ask spread from consecutive daily high and low prices. The key insight is that daily high prices are almost always buyer-initiated (at the ask) while daily lows are seller-initiated (at the bid). The ratio of high-to-low therefore captures both volatility and the spread. By comparing single-day and two-day high-low ranges the method disentangles the two components.

When to use: When only daily OHLC data is available and you need a spread estimate. More robust than the Roll (1984) estimator because it does not require negative serial covariance and performs better in the presence of stale prices.

Interpretation: Output is in price units (same scale as the input). Values typically range from 0 (perfectly liquid) to several percent of price for illiquid stocks. Negative estimates are floored at zero (model assumption violated, usually when volatility overwhelms spread).

Parameters:
  • high (Series) – Daily high prices.

  • low (Series) – Daily low prices.

  • window (int, default: 1) – Averaging window for the spread estimate. window=1 returns the raw daily estimate.

Return type:

Series

Returns:

Estimated bid-ask spread series, floored at zero.

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> close = pd.Series(100 + np.cumsum(np.random.randn(100) * 0.5))
>>> high = close + np.abs(np.random.randn(100)) * 0.3
>>> low = close - np.abs(np.random.randn(100)) * 0.3
>>> spread = corwin_schultz_spread(high, low)
>>> (spread >= 0).all()
True

References

Corwin, S. A. & Schultz, P. (2012). “A Simple Way to Estimate Bid-Ask Spreads from Daily High and Low Prices.” Journal of Finance, 67(2), 719-760.

See also

roll_spread: Implied spread from trade prices only. effective_spread: Direct spread from trade and quote data.

closing_quoted_spread(bid_close, ask_close)[source]

Quoted bid-ask spread at the market close.

The closing spread is particularly relevant for investors who trade at or near the close (e.g., mutual fund NAV calculations, index rebalancing, MOC orders). It also serves as a simple daily liquidity proxy when intraday data is unavailable.

When to use: When analyzing execution costs for daily-frequency traders, evaluating end-of-day liquidity conditions, or constructing a daily spread time series from closing quote data.

Interpretation: Narrower spreads indicate better end-of-day liquidity. Spread widening at the close often precedes periods of higher volatility or information events (e.g., earnings releases).

Parameters:
  • bid_close (Series) – Best bid price at market close.

  • ask_close (Series) – Best ask price at market close.

Return type:

Series

Returns:

Closing quoted spread series (ask - bid), in price units.

Example

>>> import pandas as pd
>>> bid = pd.Series([99.90, 99.85, 99.95])
>>> ask = pd.Series([100.10, 100.15, 100.05])
>>> spread = closing_quoted_spread(bid, ask)
>>> float(spread.iloc[0])
0.2

References

Chordia, T., Roll, R. & Subrahmanyam, A. (2001). “Market Liquidity and Trading Activity.” Journal of Finance, 56(2), 501-530.

See also

effective_spread: Execution-weighted spread measure. relative_spread: Spread normalized by midpoint.

depth_imbalance(bid_depth, ask_depth)[source]

Order book depth imbalance.

Computes (bid_depth - ask_depth) / (bid_depth + ask_depth) to measure the directional imbalance in resting limit order volume.

When to use: For real-time assessment of supply-demand imbalance in the limit order book. Commonly used as a short-horizon return predictor in high-frequency strategies.

Interpretation:

  • +1: All depth is on the bid side (strong buying interest, bullish signal).

  • -1: All depth is on the ask side (strong selling interest, bearish signal).

  • 0: Balanced book.

Values persistently above +0.3 or below -0.3 often indicate directional pressure that leads to price movement in the direction of the deeper side.

Parameters:
Return type:

Series | ndarray[tuple[Any, ...], dtype[floating]]

Returns:

Depth imbalance in [-1, 1].

Example

>>> import pandas as pd
>>> bid_depth = pd.Series([5000, 3000, 4000])
>>> ask_depth = pd.Series([3000, 5000, 4000])
>>> imb = depth_imbalance(bid_depth, ask_depth)
>>> float(imb.iloc[0])  # more bids than asks -> positive
0.25

References

Cao, C., Hansch, O. & Wang, X. (2009). “The Information Content of an Open Limit-Order Book.” Journal of Futures Markets, 29(1), 16-41.

See also

wraquant.microstructure.toxicity.order_flow_imbalance:

Volume-based imbalance measure.

wraquant.microstructure.market_quality.depth: Total market depth.

lambda_kyle_rolling(prices, volume, window=20)[source]

Rolling Kyle’s lambda with confidence intervals.

Extends kyle_lambda() by computing standard errors from the rolling OLS regression, yielding point estimates along with 95% confidence bounds. This is essential for determining whether the estimated price impact is statistically significant at each point in time.

When to use: When you need not just the level of price impact but also its precision. Useful for detecting regime changes in market liquidity – a significant widening of the confidence interval suggests structural uncertainty about the price impact coefficient.

Interpretation: A positive lambda indicates that buy-initiated volume pushes prices up (and sell-initiated pushes down), consistent with the Kyle (1985) model. Lambda values close to zero (or with confidence intervals spanning zero) suggest limited permanent price impact, i.e., a liquid market.

Parameters:
  • prices (Series) – Price series.

  • volume (Series) – Signed volume series (positive for buys, negative for sells).

  • window (int, default: 20) – Rolling regression window (must be >= 5).

Return type:

DataFrame

Returns:

DataFrame with columns 'lambda', 'std_err', 'ci_lower', 'ci_upper' (95% confidence interval).

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> prices = pd.Series(100 + np.cumsum(np.random.randn(50) * 0.1))
>>> volume = pd.Series(np.random.randn(50) * 1000)
>>> result = lambda_kyle_rolling(prices, volume, window=20)
>>> list(result.columns)
['lambda', 'std_err', 'ci_lower', 'ci_upper']

References

Kyle, A. S. (1985). “Continuous Auctions and Insider Trading.” Econometrica, 53(6), 1315-1335.

See also

kyle_lambda: Simple point estimate without confidence intervals. amihud_rolling: Rolling Amihud illiquidity ratio.

amihud_rolling(returns, volume, window=21, normalize=True)[source]

Rolling Amihud (2002) illiquidity ratio with proper normalization.

Computes the Amihud ratio over a rolling window and optionally normalizes by the cross-sectional or time-series mean so that values are comparable across different assets and time periods.

When to use: For tracking how an individual asset’s liquidity evolves over time. The normalization makes the measure comparable across assets with different price levels and trading volumes.

Interpretation: Higher values indicate less liquidity (more price impact per unit of trading volume). Sudden spikes often correspond to liquidity crises or market stress events. Typical values for large-cap US stocks are 1e-11 to 1e-9 (unnormalized).

Parameters:
  • returns (Series) – Asset return series.

  • volume (Series) – Dollar volume series (price * shares traded).

  • window (int, default: 21) – Rolling window size (default 21 for ~1 month of trading days).

  • normalize (bool, default: True) – If True, divide each rolling value by the full-sample mean so the time-series average is 1.0.

Return type:

Series

Returns:

Rolling Amihud illiquidity series.

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> returns = pd.Series(np.random.randn(100) * 0.01)
>>> volume = pd.Series(np.random.uniform(1e6, 5e6, 100))
>>> illiq = amihud_rolling(returns, volume, window=21)
>>> illiq.name
'amihud_rolling'

References

Amihud, Y. (2002). “Illiquidity and Stock Returns: Cross-Section and Time-Series Effects.” Journal of Financial Markets, 5(1), 31-56.

See also

amihud_illiquidity: Static (full-sample) Amihud ratio. liquidity_commonality: How much liquidity co-moves with the market.

liquidity_commonality(asset_illiquidity, market_illiquidity, window=60)[source]

Commonality in liquidity (Chordia, Roll & Subrahmanyam, 2000).

Measures how much an individual asset’s liquidity co-moves with market-wide liquidity. The commonality coefficient is estimated via rolling regressions of changes in the asset’s illiquidity measure on changes in the market-wide illiquidity measure.

When to use: For assessing systematic liquidity risk. Assets with high commonality become illiquid precisely when the entire market becomes illiquid – an undesirable property that investors demand a premium for bearing.

Interpretation: The output is the rolling R-squared from the regression. Higher values (closer to 1) indicate stronger co-movement with market liquidity. Values above 0.3 suggest meaningful systematic liquidity risk. Most large-cap stocks show commonality R-squared of 0.05-0.20.

Parameters:
  • asset_illiquidity (Series) – Individual asset’s illiquidity measure (e.g., Amihud ratio, effective spread) as a time series.

  • market_illiquidity (Series) – Market-wide illiquidity aggregate (e.g., equal-weighted average Amihud ratio across all stocks).

  • window (int, default: 60) – Rolling regression window (default 60 for ~3 months).

Return type:

Series

Returns:

Rolling R-squared of the commonality regression.

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> asset = pd.Series(np.random.randn(200).cumsum())
>>> market = pd.Series(np.random.randn(200).cumsum())
>>> r2 = liquidity_commonality(asset, market, window=60)
>>> r2.name
'liquidity_commonality'

References

Chordia, T., Roll, R. & Subrahmanyam, A. (2000). “Commonality in Liquidity.” Journal of Financial Economics, 56(1), 3-28.

See also

amihud_rolling: Generate the illiquidity input for this function.

spread_decomposition(trade_prices, bid, ask, direction, delay=5)[source]

Huang-Stoll (1997) three-way spread decomposition.

Decomposes the effective spread into three economically distinct components:

  1. Adverse selection: compensation for trading against informed traders who possess private information. This portion of the spread is a permanent price impact – the midpoint moves against the liquidity provider after the trade.

  2. Order processing: compensation for the mechanical costs of market-making (exchange fees, technology, labor).

  3. Inventory holding: compensation for the risk of holding an unbalanced inventory.

When to use: For understanding why spreads are wide. If adverse selection dominates, the market has significant information asymmetry. If order processing dominates, the market is structurally costly.

Interpretation:

  • Adverse selection fraction > 0.5 indicates a market dominated by informed trading (e.g., single-stock options, small-cap equities before earnings).

  • Order processing fraction > 0.5 indicates a market where mechanical costs dominate (e.g., bond markets, low-volatility large-cap equities).

  • Inventory fraction is typically the smallest component for equities but can be large for less liquid instruments.

Parameters:
  • trade_prices (Series) – Executed trade prices.

  • bid (Series) – Best bid prices at time of each trade.

  • ask (Series) – Best ask prices at time of each trade.

  • direction (Series) – Trade direction indicator (+1 buy, -1 sell).

  • delay (int, default: 5) – Number of observations to look ahead for measuring the permanent price impact (default 5).

Returns:

  • 'adverse_selection': fraction of the spread due to information asymmetry.

  • 'order_processing': fraction due to order handling costs.

  • 'inventory_holding': fraction due to inventory risk.

  • 'effective_spread_mean': average effective spread.

Return type:

dict[str, float]

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> n = 200
>>> mid = 100 + np.cumsum(np.random.randn(n) * 0.01)
>>> spread_half = 0.05
>>> bid = pd.Series(mid - spread_half)
>>> ask = pd.Series(mid + spread_half)
>>> direction = pd.Series(np.random.choice([1, -1], n))
>>> trades = pd.Series(np.where(direction > 0, ask, bid))
>>> result = spread_decomposition(trades, bid, ask, direction)
>>> 0 <= result['adverse_selection'] <= 1
True

References

Huang, R. D. & Stoll, H. R. (1997). “The Components of the Bid-Ask Spread: A General Approach.” Review of Financial Studies, 10(4), 995-1034.

See also

effective_spread: Total execution cost measure. realized_spread: Liquidity provider’s revenue component.

Toxicity

Order flow toxicity metrics.

Provides measures of adverse selection and informed trading probability used in market microstructure analysis.

vpin(volume, buy_volume, n_buckets=50)[source]

Volume-Synchronized Probability of Informed Trading (VPIN).

Aggregates volume into equal-sized buckets and measures the absolute order imbalance in each bucket.

Parameters:
Return type:

ndarray[tuple[Any, ...], dtype[floating]]

Returns:

VPIN values, one per bucket. Higher values indicate more informed trading activity. Values above 0.4 are typically elevated.

Example

>>> import numpy as np
>>> from wraquant.microstructure.toxicity import vpin
>>> volume = np.array([1000, 1500, 800, 1200, 900])
>>> buy_vol = np.array([600, 400, 500, 800, 300])
>>> result = vpin(volume, buy_vol, n_buckets=2)
>>> len(result) >= 1
True

See also

pin_model: Static PIN estimation from daily trade counts. toxicity_index: Composite toxicity score combining VPIN and OFI. bulk_volume_classification: Estimate buy/sell volume from OHLCV.

pin_model(buy_trades, sell_trades)[source]

Estimate the Probability of Informed Trading (PIN).

Uses maximum likelihood on the Easley-Kiefer-O’Hara-Paperman model. The PIN is defined as alpha * mu / (alpha * mu + eps_b + eps_s).

Parameters:
Return type:

dict[str, float]

Returns:

Dictionary with keys 'pin', 'alpha', 'delta', 'mu', 'eps_b', 'eps_s'. PIN values above 0.20 indicate significant informed trading.

Example

>>> import numpy as np
>>> from wraquant.microstructure.toxicity import pin_model
>>> rng = np.random.default_rng(42)
>>> buys = rng.poisson(50, size=60)
>>> sells = rng.poisson(50, size=60)
>>> result = pin_model(buys, sells)
>>> 0 <= result['pin'] <= 1
True

See also

adjusted_pin: PIN corrected for symmetric liquidity shocks. vpin: Volume-synchronized alternative (real-time).

order_flow_imbalance(buy_volume, sell_volume, window=20)[source]

Rolling order flow imbalance (OFI).

Defined as (buy_volume - sell_volume) / (buy_volume + sell_volume) averaged over a rolling window.

Parameters:
  • buy_volume (Series) – Buy-initiated volume per period.

  • sell_volume (Series) – Sell-initiated volume per period.

  • window (int, default: 20) – Rolling window size.

Return type:

Series

Returns:

Rolling OFI series in [-1, 1]. Values near +1 indicate strong buying pressure; near -1 indicates selling pressure.

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> buys = pd.Series(np.random.uniform(100, 500, 50))
>>> sells = pd.Series(np.random.uniform(100, 500, 50))
>>> ofi = order_flow_imbalance(buys, sells, window=10)
>>> len(ofi) == 50
True

See also

vpin: Volume-synchronized informed trading probability. wraquant.microstructure.liquidity.depth_imbalance:

Order book depth imbalance.

trade_classification(trade_prices, bid, ask)[source]

Lee-Ready trade classification combining quote test and tick test.

Classifies each trade as buyer-initiated (+1) or seller-initiated (-1).

The quote test assigns direction based on whether the trade price is above or below the midpoint. When the trade is at the midpoint, the tick test (based on successive price changes) is used as a fallback.

Parameters:
  • trade_prices (Series) – Executed trade prices.

  • bid (Series) – Best bid prices at time of each trade.

  • ask (Series) – Best ask prices at time of each trade.

Return type:

Series

Returns:

Classification series with values +1 (buy) or -1 (sell).

Example

>>> import pandas as pd
>>> trades = pd.Series([100.05, 99.95, 100.00, 100.03])
>>> bid = pd.Series([99.98, 99.93, 99.98, 100.00])
>>> ask = pd.Series([100.02, 99.97, 100.02, 100.06])
>>> direction = trade_classification(trades, bid, ask)
>>> int(direction.iloc[0])  # trade above midpoint -> buy
1

See also

bulk_volume_classification: Classify from OHLCV data (no quotes). order_flow_imbalance: Aggregate classified volume into a signal.

information_share(prices_list)[source]

Hasbrouck’s information share across multiple venues.

Estimates each venue’s contribution to price discovery using the variance decomposition of a VECM residual. A simplified approach is used: the information share for venue j is proportional to 1 / var(price_j - mean_price).

Parameters:

prices_list (list[Series]) – List of price series from different venues, all sharing the same DatetimeIndex.

Return type:

ndarray[tuple[Any, ...], dtype[floating]]

Returns:

Array of information shares summing to 1, one per venue. A venue with a higher share contributes more to price discovery.

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> base = pd.Series(100 + np.cumsum(np.random.randn(200) * 0.1))
>>> venue_a = base + np.random.randn(200) * 0.01
>>> venue_b = base + np.random.randn(200) * 0.05
>>> shares = information_share([venue_a, venue_b])
>>> abs(shares.sum() - 1.0) < 1e-10
True

See also

wraquant.microstructure.market_quality.hasbrouck_information_share:

Cholesky-based information share with bounds.

wraquant.microstructure.market_quality.gonzalo_granger_component:

GG permanent-transitory decomposition.

bulk_volume_classification(close, high, low, volume)[source]

Bulk Volume Classification (BVC).

Classifies aggregate volume into buy- and sell-initiated components using the position of the close price relative to the high-low range. This avoids the need for tick-by-tick data or quote data, making it practical for daily-frequency analysis.

The buy fraction is estimated as:

Z = (close - low) / (high - low)
buy_fraction = Z  (linear interpolation)

The CDF of a standard normal evaluated at Z is sometimes used, but the linear version performs comparably and is more transparent.

When to use: When you need to estimate buy/sell volume from daily OHLCV data without tick-by-tick trade records. Suitable as an input to VPIN calculations, order flow imbalance metrics, or any model requiring buy/sell volume decomposition.

Interpretation: When the close is near the high, most volume is classified as buys. When the close is near the low, most volume is classified as sells. The BVC estimate is noisier than Lee-Ready for individual trades but aggregates well across bars.

Parameters:
  • close (Series) – Closing prices.

  • high (Series) – High prices.

  • low (Series) – Low prices.

  • volume (Series) – Total volume per bar.

Return type:

DataFrame

Returns:

DataFrame with columns 'buy_volume', 'sell_volume', 'buy_fraction'.

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> n = 50
>>> close = pd.Series(100 + np.cumsum(np.random.randn(n) * 0.5))
>>> high = close + np.abs(np.random.randn(n))
>>> low = close - np.abs(np.random.randn(n))
>>> volume = pd.Series(np.random.uniform(1e4, 5e4, n))
>>> result = bulk_volume_classification(close, high, low, volume)
>>> list(result.columns)
['buy_volume', 'sell_volume', 'buy_fraction']

References

Easley, D., Lopez de Prado, M. M. & O’Hara, M. (2012). “Bulk Classification of Trading Activity.” Working Paper, Johnson School Research Paper Series.

See also

trade_classification: Lee-Ready classification from tick data. vpin: Uses buy/sell volume to compute informed trading probability.

adjusted_pin(buy_trades, sell_trades)[source]

Adjusted Probability of Informed Trading (AdjPIN).

Extends the standard PIN model (Easley et al., 1996) by separating information-driven order flow from symmetric order-flow shocks that arise from liquidity effects. The Duarte & Young (2009) adjustment adds a regime where both buy and sell arrival rates increase simultaneously (a liquidity event), preventing the model from misattributing liquidity shocks as informed trading.

The standard PIN is known to overstate the probability of informed trading during periods of high overall activity (e.g., index rebalancing, option expiration). AdjPIN corrects for this bias.

When to use: When you need a cleaner measure of information asymmetry that is not contaminated by correlated liquidity events. Preferred over standard PIN for cross-sectional comparisons where stocks have heterogeneous trading activity.

Interpretation:

  • adj_pin < 0.10: Low information asymmetry, suitable for uninformed market-making.

  • adj_pin 0.10-0.30: Moderate; some informed activity detected.

  • adj_pin > 0.30: High information asymmetry; adverse selection risk is elevated.

Parameters:
Return type:

dict[str, float]

Returns:

Dictionary with keys 'adj_pin', 'pin_unadjusted', 'alpha', 'delta', 'mu', 'eps_b', 'eps_s', 'theta' (probability of symmetric activity shock).

Example

>>> import numpy as np
>>> from wraquant.microstructure.toxicity import adjusted_pin
>>> rng = np.random.default_rng(42)
>>> buys = rng.poisson(50, size=60)
>>> sells = rng.poisson(50, size=60)
>>> result = adjusted_pin(buys, sells)
>>> 0 <= result['adj_pin'] <= 1
True

References

Duarte, J. & Young, L. (2009). “Why is PIN Priced?” Journal of Financial Economics, 91(2), 119-138.

See also

pin_model: Standard (unadjusted) PIN estimation. vpin: Real-time alternative using volume buckets.

toxicity_index(vpin_values, ofi_values, spread_values, weights=(0.5, 0.3, 0.2))[source]

Composite toxicity index combining VPIN, OFI, and spread dynamics.

Produces a single 0-100 score summarizing the overall level of adverse selection / order flow toxicity. Each input component is independently normalized to [0, 1] via min-max scaling, then combined with the specified weights and rescaled to 0-100.

When to use: For a single-number dashboard indicator of market toxicity that synthesizes multiple underlying signals. Useful for setting risk limits (e.g., “pause quoting when toxicity > 70”) or for cross-sectional comparison across instruments.

Interpretation:

  • 0-20: Low toxicity; market is safe for passive market-making.

  • 20-50: Moderate toxicity; monitor order flow closely.

  • 50-80: Elevated toxicity; consider widening quotes or reducing size.

  • 80-100: Extreme toxicity; likely informed trading event in progress.

Parameters:
Return type:

ndarray[tuple[Any, ...], dtype[floating]]

Returns:

Composite toxicity index in [0, 100].

Example

>>> import numpy as np
>>> from wraquant.microstructure.toxicity import toxicity_index
>>> vpin_vals = np.array([0.2, 0.3, 0.5, 0.4])
>>> ofi_vals = np.array([0.1, -0.3, 0.5, -0.2])
>>> spread_vals = np.array([0.01, 0.02, 0.03, 0.015])
>>> idx = toxicity_index(vpin_vals, ofi_vals, spread_vals)
>>> (idx >= 0).all() and (idx <= 100).all()
True

References

Easley, D., Lopez de Prado, M. M. & O’Hara, M. (2011). “The Microstructure of the ‘Flash Crash’.” Journal of Portfolio Management, 37(2), 118-128.

See also

vpin: One of the component inputs. order_flow_imbalance: Another component input.

informed_trading_intensity(buy_volume, sell_volume, window=20)[source]

Time-varying probability of informed trading using a sequential model.

Estimates the instantaneous probability that the marginal trade is information-driven, based on the sequential trade framework of Glosten & Milgrom (1985) and Easley & O’Hara (1987).

The approach uses a rolling Bayesian update. In each window, the fraction of trades on the “aggressive” side (whichever of buy or sell dominates) is used to infer the conditional probability that an information event is occurring, under the assumption that informed traders cluster on one side while uninformed traders arrive symmetrically.

When to use: For real-time monitoring of informed trading intensity. Unlike the static PIN model, this provides a time-varying signal that can be used to dynamically adjust quotes or execution strategies.

Interpretation: Values near 0.5 indicate balanced (uninformed) flow. Values approaching 1.0 indicate that nearly all marginal volume is on one side, consistent with an active informed trader.

Parameters:
  • buy_volume (Series) – Buy-initiated volume per observation.

  • sell_volume (Series) – Sell-initiated volume per observation.

  • window (int, default: 20) – Rolling window for the Bayesian update.

Return type:

Series

Returns:

Rolling informed trading intensity in [0, 1].

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> buys = pd.Series(np.random.uniform(100, 500, 50))
>>> sells = pd.Series(np.random.uniform(100, 500, 50))
>>> intensity = informed_trading_intensity(buys, sells, window=10)
>>> intensity.name
'informed_trading_intensity'

References

Easley, D. & O’Hara, M. (1987). “Price, Trade Size, and Information in Securities Markets.” Journal of Financial Economics, 19(1), 69-90.

See also

pin_model: Static probability of informed trading. order_flow_imbalance: Simpler directional flow measure.

Market Quality

Market quality metrics.

Provides bid-ask spread measures, market depth indicators, resilience metrics, and variance ratio tests for assessing overall market quality.

quoted_spread(bid, ask)[source]

Quoted bid-ask spread: ask - bid.

Parameters:
Return type:

Series | ndarray[tuple[Any, ...], dtype[floating]]

Returns:

Absolute quoted spread.

Example

>>> import numpy as np
>>> bid = np.array([99.90, 99.85])
>>> ask = np.array([100.10, 100.15])
>>> quoted_spread(bid, ask)
array([0.2, 0.3])

See also

relative_spread: Spread normalized by midpoint. wraquant.microstructure.liquidity.effective_spread:

Execution-weighted spread.

relative_spread(bid, ask)[source]

Relative spread: (ask - bid) / midpoint.

Parameters:
Return type:

Series | ndarray[tuple[Any, ...], dtype[floating]]

Returns:

Relative spread as a fraction of the midpoint. Typical values are 0.001-0.01 for liquid large-cap stocks.

Example

>>> import pandas as pd
>>> bid = pd.Series([99.90, 99.85])
>>> ask = pd.Series([100.10, 100.15])
>>> rs = relative_spread(bid, ask)
>>> float(rs.iloc[0])  # 0.20 / 100.0 = 0.002
0.002

See also

quoted_spread: Absolute spread in price units.

depth(bid_volume, ask_volume, levels=5)[source]

Market depth: total volume available at the top N price levels.

Parameters:
Return type:

Series | ndarray[tuple[Any, ...], dtype[floating]]

Returns:

Total depth (bid + ask) summed across the requested levels.

Example

>>> import numpy as np
>>> bid_vol = np.array([1000, 800, 500, 300, 200])
>>> ask_vol = np.array([900, 700, 600, 400, 100])
>>> depth(bid_vol, ask_vol, levels=3)
4500.0

See also

wraquant.microstructure.liquidity.depth_imbalance:

Directional imbalance between bid and ask depth.

resiliency(spreads, window=20)[source]

Spread resiliency: how quickly the spread recovers after a shock.

Measured as the negative autocorrelation of spread changes. A higher value indicates a more resilient market (spreads revert faster).

Parameters:
  • spreads (Series) – Time series of quoted or effective spreads.

  • window (int, default: 20) – Rolling window for estimating autocorrelation of spread changes.

Return type:

Series

Returns:

Rolling resiliency measure. Higher values indicate faster spread recovery (more resilient market).

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> spreads = pd.Series(0.05 + np.random.randn(100) * 0.01)
>>> res = resiliency(spreads, window=20)
>>> res.name
'resiliency'

See also

quoted_spread: Generate the spread input for this function. variance_ratio: Random walk efficiency test.

variance_ratio(prices, short_period=2, long_period=10)[source]

Lo-MacKinlay (1988) variance ratio test.

Tests the random walk hypothesis by comparing the variance of long_period returns to short_period returns, scaled appropriately. Under a random walk, the ratio equals 1.

Parameters:
  • prices (Series) – Price series (levels, not returns).

  • short_period (int, default: 2) – Short return horizon (default 2).

  • long_period (int, default: 10) – Long return horizon (must be a multiple of short_period for a clean comparison, but this is not enforced).

Returns:

  • 'vr': Variance ratio.

  • 'z_stat': Asymptotic z-statistic under IID assumption.

  • 'p_value': Two-sided p-value.

Return type:

dict[str, float]

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> prices = pd.Series(100 * np.exp(np.cumsum(np.random.randn(500) * 0.01)))
>>> result = variance_ratio(prices, short_period=2, long_period=10)
>>> 'vr' in result and 'p_value' in result
True

See also

market_efficiency_ratio: Multi-lag efficiency summary. resiliency: Spread recovery speed.

hasbrouck_information_share(prices_list)[source]

Hasbrouck (1995) information share for price discovery analysis.

Measures each venue’s (or instrument’s) contribution to the efficient price innovation. The information share for venue j is the fraction of the total variance of the efficient price innovation attributable to that venue.

The method is based on a Vector Error Correction Model (VECM). When the innovation covariance matrix is non-diagonal, upper and lower bounds are computed via Cholesky factorization with different orderings. The midpoint of these bounds is reported as the point estimate.

When to use: For analyzing where price discovery occurs across multiple venues (e.g., NYSE vs NASDAQ, futures vs spot, ADR vs local listing). Essential for regulatory analysis and optimal execution venue selection.

Interpretation:

  • Information shares sum to 1.0 across venues.

  • A venue with information share > 0.5 in a two-venue system is the dominant price discovery venue.

  • If the upper and lower bounds are far apart, the venues have highly correlated innovations and the attribution is ambiguous.

Parameters:

prices_list (list[Series]) – List of price series from different venues, all sharing the same DatetimeIndex. Must contain at least 2 venues.

Returns:

  • 'midpoint': Midpoint information shares (best point estimate).

  • 'upper': Upper-bound information shares.

  • 'lower': Lower-bound information shares.

Return type:

dict[str, ndarray[tuple[Any, ...], dtype[floating]]]

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> base = pd.Series(100 + np.cumsum(np.random.randn(200) * 0.1))
>>> venue_a = base + np.random.randn(200) * 0.01
>>> venue_b = base + np.random.randn(200) * 0.05
>>> result = hasbrouck_information_share([venue_a, venue_b])
>>> abs(result['midpoint'].sum() - 1.0) < 0.01
True

References

Hasbrouck, J. (1995). “One Security, Many Markets: Determining the Contributions to Price Discovery.” Journal of Finance, 50(4), 1175-1199.

See also

gonzalo_granger_component: Unique (non-bounded) price discovery measure. wraquant.microstructure.toxicity.information_share:

Simplified variance-based information share.

gonzalo_granger_component(prices_list)[source]

Gonzalo-Granger (1995) permanent-transitory decomposition.

Decomposes cointegrated price series into a permanent (efficient price) component and transitory (pricing error) component. The permanent component weights reveal each venue’s contribution to the long-run efficient price.

Unlike Hasbrouck information shares, the GG component shares are unique (not dependent on Cholesky ordering).

When to use: As a complement to Hasbrouck information shares for price discovery analysis. Particularly useful when you need a unique (not bounded) measure of each venue’s price discovery contribution.

Interpretation:

  • GG weights sum to 1.0 across venues.

  • A venue with a large GG weight drives the permanent price – its price innovations are absorbed by the market as a whole.

  • A venue with a small GG weight primarily reflects transitory noise.

Parameters:

prices_list (list[Series]) – List of cointegrated price series from different venues, sharing the same DatetimeIndex.

Returns:

  • 'gg_weights': Gonzalo-Granger permanent component weights (one per venue, summing to 1).

  • 'alpha': Error-correction coefficients for each venue.

Return type:

dict[str, ndarray[tuple[Any, ...], dtype[floating]]]

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> base = pd.Series(100 + np.cumsum(np.random.randn(200) * 0.1))
>>> venue_a = base + np.random.randn(200) * 0.01
>>> venue_b = base + np.random.randn(200) * 0.05
>>> result = gonzalo_granger_component([venue_a, venue_b])
>>> abs(result['gg_weights'].sum() - 1.0) < 1e-10
True

References

Gonzalo, J. & Granger, C. (1995). “Estimation of Common Long- Memory Components in Cointegrated Systems.” Journal of Business & Economic Statistics, 13(1), 27-35.

See also

hasbrouck_information_share: Cholesky-based information share.

market_efficiency_ratio(prices, lags=None)[source]

Market efficiency ratio based on variance ratio analysis.

Adapts the Lo-MacKinlay variance ratio test for market quality assessment by computing the ratio at multiple lags and summarizing the degree of departure from efficient pricing.

Under an efficient market (random walk), the variance of k-period returns equals k times the variance of 1-period returns (VR = 1). Departures indicate:

  • VR > 1: Positive autocorrelation (momentum, trending).

  • VR < 1: Negative autocorrelation (mean reversion, microstructure noise).

When to use: For assessing how efficiently a market incorporates information. Useful for comparing market quality across instruments, venues, or time periods.

Interpretation: The efficiency_score is the average absolute deviation of variance ratios from 1.0. Lower is more efficient:

  • < 0.05: Highly efficient market.

  • 0.05-0.15: Moderately efficient.

  • > 0.15: Significant inefficiency (either microstructure noise or predictable patterns).

Parameters:
  • prices (Series) – Price series (levels, not returns).

  • lags (list[int] | None, default: None) – List of return horizons to test (default [2, 5, 10, 20]).

Returns:

  • 'efficiency_score': Average |VR - 1| across lags (lower is more efficient).

  • 'variance_ratios': Dict mapping each lag to its VR value.

Return type:

dict[str, float | dict[int, float]]

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> prices = pd.Series(100 * np.exp(np.cumsum(np.random.randn(500) * 0.01)))
>>> result = market_efficiency_ratio(prices)
>>> result['efficiency_score'] >= 0
True

References

Lo, A. W. & MacKinlay, A. C. (1988). “Stock Market Prices Do Not Follow Random Walks.” Review of Financial Studies, 1(1), 41-66.

See also

variance_ratio: Single-lag variance ratio test with p-value. resiliency: Spread-based market quality measure.

price_impact_regression(price_changes, signed_volume, lags=5)[source]

Price impact regression decomposing permanent and temporary effects.

Regresses price changes on contemporaneous and lagged signed order flow to estimate:

  • Permanent impact: the long-run effect of a unit of signed volume on prices (information content).

  • Temporary impact: the transient effect that reverses over time (liquidity provision revenue).

The regression model is:

dp_t = c + beta_0 * v_t + beta_1 * v_{t-1} + ... + beta_k * v_{t-k} + eps_t

Permanent impact is the sum of all beta coefficients. Temporary impact is beta_0 - permanent_impact.

When to use: For analyzing the dynamic price impact of trading activity. Essential for optimal execution and transaction cost analysis.

Interpretation:

  • Positive permanent impact: trades convey information; the market adjusts permanently.

  • Negative temporary impact: trades cause a transient price displacement that reverses (liquidity provider’s profit).

  • R-squared: fraction of price variation explained by order flow; higher values indicate more order-flow-driven pricing.

Parameters:
  • price_changes (Series) – Price change (delta-p) series.

  • signed_volume (Series) – Signed order flow (positive = buys, negative = sells).

  • lags (int, default: 5) – Number of lagged order flow terms to include (default 5).

Returns:

  • 'permanent_impact': Sum of all order flow coefficients.

  • 'temporary_impact': beta_0 - permanent_impact.

  • 'beta_0': Contemporaneous impact coefficient.

  • 'r_squared': R-squared of the regression.

Return type:

dict[str, float]

Example

>>> import pandas as pd, numpy as np
>>> np.random.seed(42)
>>> dp = pd.Series(np.random.randn(200) * 0.01)
>>> sv = pd.Series(np.random.randn(200) * 1000)
>>> result = price_impact_regression(dp, sv, lags=3)
>>> 'permanent_impact' in result and 'r_squared' in result
True

References

Hasbrouck, J. (1991). “Measuring the Information Content of Stock Trades.” Journal of Finance, 46(1), 179-207.

See also

wraquant.microstructure.liquidity.kyle_lambda:

Simpler single-coefficient price impact.

wraquant.microstructure.liquidity.spread_decomposition:

Spread-based adverse selection decomposition.

intraday_volatility_pattern(prices, freq='h')[source]

Estimate the intraday volatility pattern (U-shape or J-shape).

Computes the average absolute return at each intraday time bucket (hourly by default), revealing the well-documented U-shaped pattern where volatility is highest at the open and close and lowest at midday.

When to use: For understanding the diurnal volatility cycle of a market. Essential for:

  • Optimal execution: schedule trades during low-volatility periods.

  • Risk management: adjust intraday VaR for time-of-day effects.

  • Market-making: widen quotes during high-volatility open/close.

Interpretation: The output is indexed by time-of-day (e.g., hour). Peaks at the open and close indicate information-driven volatility (overnight information absorption and closing auctions). A flat profile suggests a market dominated by algorithmic flow with little information asymmetry.

Parameters:
  • prices (Series) – Intraday price series with a DatetimeIndex.

  • freq (str, default: 'h') – Resampling frequency for the volatility buckets. Use 'h' for hourly, '30min' for half-hourly, '15min' for 15-minute buckets.

Return type:

Series

Returns:

Average absolute return by time-of-day bucket, indexed by the bucket label (e.g., hour of day).

Example

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range('2024-01-02 09:30', periods=78, freq='5min')
>>> prices = pd.Series(100 + np.cumsum(np.random.randn(78) * 0.1),
...                     index=idx)
>>> pattern = intraday_volatility_pattern(prices, freq='h')
>>> len(pattern) > 0
True

References

Wood, R. A., McInish, T. H. & Ord, J. K. (1985). “An Investigation of Transactions Data for NYSE Stocks.” Journal of Finance, 40(3), 723-739.

Admati, A. R. & Pfleiderer, P. (1988). “A Theory of Intraday Patterns: Volume and Price Variability.” Review of Financial Studies, 1(1), 3-40.

See also

variance_ratio: Variance ratio test for random walk. market_efficiency_ratio: Multi-lag efficiency assessment.