Source code for wraquant.risk.copulas

"""Copula models for dependency structure in multi-asset returns.

Copulas separate the *marginal* distributions of individual assets from
their *dependence* structure. This is critical in finance because linear
correlation (Pearson) understates co-movement in the tails -- precisely
where risk matters most. Copulas capture non-linear, asymmetric, and
tail-specific dependence that correlation matrices cannot.

This module provides five copula families:

1. **Gaussian copula** (``fit_gaussian_copula``) -- the simplest elliptical
   copula. Captures linear dependence but has *zero* tail dependence: extreme
   co-movements are asymptotically independent. Useful as a baseline but
   dangerous for tail-risk applications.

2. **Student-t copula** (``fit_t_copula``) -- symmetric tail dependence
   controlled by degrees of freedom. Lower df => heavier tails and
   stronger tail dependence. The standard choice for equity portfolios
   where joint crashes are common.

3. **Clayton copula** (``fit_clayton_copula``) -- Archimedean copula with
   *lower* tail dependence and no upper tail dependence. Use when you
   expect assets to crash together but rally independently (typical for
   equities and credit).

4. **Gumbel copula** (``fit_gumbel_copula``) -- Archimedean copula with
   *upper* tail dependence and no lower tail dependence. Use for assets
   that rally together but crash independently (rare but possible for
   certain commodity pairs).

5. **Frank copula** (``fit_frank_copula``) -- Archimedean copula with
   *symmetric* dependence and *no* tail dependence. Useful as a
   benchmark or when tail dependence is genuinely absent.

Utilities:
    - ``copula_simulate``: Monte Carlo simulation from any fitted copula.
    - ``tail_dependence``: empirical estimation of lower and upper tail
      dependence coefficients.
    - ``rank_correlation``: Kendall's tau and Spearman's rho matrices.

How to choose:
    - Start with ``tail_dependence`` to check if the data has
      asymmetric tail dependence.
    - If lower > upper: use Clayton.
    - If upper > lower: use Gumbel.
    - If roughly symmetric tails: use Student-t with estimated df.
    - If no significant tail dependence: use Gaussian or Frank.

References:
    - Nelsen (2006), "An Introduction to Copulas"
    - McNeil, Frey & Embrechts (2005), Ch. 5, "Copulas and Dependence"
    - Embrechts, Lindskog & McNeil (2003), "Modelling Dependence with
      Copulas and Applications to Risk Management"
"""

from __future__ import annotations

from typing import Any

import numpy as np
from scipy import optimize
from scipy import stats as sp_stats

# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------


def _empirical_cdf(x: np.ndarray) -> np.ndarray:
    """Convert raw observations to pseudo-observations (uniform marginals).

    Uses the rescaled empirical CDF:  rank / (n + 1) to avoid 0 and 1.
    """
    n = len(x)
    ranks = sp_stats.rankdata(x)
    return ranks / (n + 1)


# ---------------------------------------------------------------------------
# Gaussian copula
# ---------------------------------------------------------------------------



[docs]
def fit_gaussian_copula(
    returns: np.ndarray,
) -> dict[str, Any]:
    r"""Fit a Gaussian copula to multivariate returns.

    The Gaussian copula models dependence via a multivariate normal
    distribution applied to the rank-transformed (uniform) data.  It
    captures linear dependence but has **zero tail dependence**: in the
    limit, extreme co-movements become independent.

    Mathematical formulation:

        C(u_1, ..., u_d) = Phi_R(Phi^{-1}(u_1), ..., Phi^{-1}(u_d))

    where Phi is the standard normal CDF, Phi^{-1} is its inverse
    (quantile function), and Phi_R is the multivariate normal CDF with
    correlation matrix R.

    The density is:

        c(u) = |R|^{-1/2} exp(-1/2 z^T (R^{-1} - I) z)

    where z_i = Phi^{-1}(u_i).

    Tail dependence:

        lambda_L = lambda_U = 0 (for rho < 1)

    This means the Gaussian copula predicts that extreme co-movements
    vanish in the tails — a dangerous assumption for financial risk.

    Interpretation:
        - The correlation matrix R describes the "normal-like" dependence
          structure of the data.
        - **Key limitation**: the Gaussian copula implies that joint
          extreme events (crashes, rallies) are asymptotically
          independent.  The 2008 crisis demonstrated that assets crash
          together far more than the Gaussian copula predicts.
        - If ``tail_dependence()`` returns non-negligible lower tail
          dependence (> 0.1), the Gaussian copula is inappropriate —
          use ``fit_t_copula`` or ``fit_clayton_copula`` instead.
        - The fitted correlation matrix R is NOT the same as the Pearson
          correlation of raw returns.  It is the correlation of the
          rank-transformed (copula) data.

    When to use:
        - As a **baseline** for dependence modelling.
        - When tail dependence is genuinely absent (rare in equities).
        - For quick simulation of correlated returns.
        - Compare its AIC/BIC against Student-t to test for tail dependence.

    Parameters:
        returns: Array of shape ``(n_obs, n_assets)`` with raw returns.
            Each column is one asset's return series.

    Returns:
        Dict with keys:

        * ``"correlation"`` (*ndarray*) -- estimated copula correlation
          matrix R of shape (d, d).  Off-diagonal values measure the
          normal-copula dependence.  Higher values = stronger co-movement.
        * ``"copula_type"`` (*str*) -- ``"gaussian"``.

    Example:
        >>> import numpy as np
        >>> from wraquant.risk.copulas import fit_gaussian_copula, fit_t_copula
        >>> rng = np.random.default_rng(42)
        >>> returns = rng.multivariate_normal([0, 0], [[1, 0.6], [0.6, 1]], 500)
        >>> gauss = fit_gaussian_copula(returns)
        >>> print(f"Copula correlation: {gauss['correlation'][0, 1]:.3f}")
        >>> # Compare with t-copula to test for tail dependence
        >>> t_cop = fit_t_copula(returns)
        >>> print(f"t-copula df: {t_cop['df']:.1f} (lower = heavier tails)")

    See Also:
        fit_t_copula: Student-t copula with symmetric tail dependence.
          Use when df < 30 to capture crash co-movement.
        fit_clayton_copula: Lower-tail dependence only (crash contagion).
        tail_dependence: Empirically check if tail dependence exists.
        copula_simulate: Generate correlated samples from the fitted copula.
    """
    from wraquant.core._coerce import coerce_array

    if returns.ndim == 1:
        returns = coerce_array(returns, name="returns").reshape(-1, 1)
    else:
        returns = np.asarray(returns, dtype=np.float64)
    n_obs, n_assets = returns.shape
    u = np.column_stack([_empirical_cdf(returns[:, j]) for j in range(n_assets)])
    z = sp_stats.norm.ppf(u)

    # Clip to avoid inf at boundaries (shouldn't happen with n+1 rule)
    z = np.clip(z, -8, 8)

    corr = np.corrcoef(z, rowvar=False)

    return {
        "correlation": corr,
        "copula_type": "gaussian",
    }



# ---------------------------------------------------------------------------
# Student-t copula
# ---------------------------------------------------------------------------



[docs]
def fit_t_copula(
    returns: np.ndarray,
    df: float = 5.0,
) -> dict[str, Any]:
    r"""Fit a Student-t copula to multivariate returns.

    The Student-t copula is the **standard choice for equity portfolios**
    because it captures symmetric tail dependence: the tendency of
    assets to experience extreme co-movements (both crashes and
    rallies) more often than a Gaussian model would predict.

    Mathematical formulation:

        C(u_1, ..., u_d) = t_{R,df}(t_df^{-1}(u_1), ..., t_df^{-1}(u_d))

    where t_df is the univariate Student-t CDF with df degrees of freedom,
    and t_{R,df} is the multivariate t CDF with correlation matrix R.

    Tail dependence coefficient (bivariate case):

        lambda_L = lambda_U = 2 * t_{df+1}(-sqrt((df+1)(1-rho)/(1+rho)))

    Unlike the Gaussian copula, this is **strictly positive** for finite
    df, meaning extreme co-movements DO occur.

    Interpretation:
        - **df** (degrees of freedom) controls tail heaviness:

          - **df = 3-5**: Very heavy tails, strong tail dependence.
            Typical for equity portfolios during turbulent markets.
            lambda ~ 0.3-0.5 for rho=0.5.
          - **df = 10-20**: Moderate tails. Appropriate for investment-
            grade fixed income or diversified portfolios.
          - **df > 30**: Nearly Gaussian (tail dependence < 0.05).
          - **df -> inf**: Converges to Gaussian copula exactly.

        - Compare df across time periods: if df drops from 15 to 5,
          tail dependence has increased — crisis contagion is building.
        - Compare fitted df to Gaussian AIC: if t-copula AIC is much
          lower, tail dependence is statistically significant.

    When to use:
        - **Default choice** for equity and credit risk modelling.
        - VaR/CVaR estimation for multi-asset portfolios.
        - When you expect both crashes AND rallies to be correlated
          (symmetric dependence).
        - If you need asymmetric tail dependence (crashes correlated
          but rallies independent), use ``fit_clayton_copula`` instead.

    Parameters:
        returns: Array of shape ``(n_obs, n_assets)`` with raw returns.
        df: Degrees of freedom. Lower = heavier tails. **Typical values:
            3-5 for equities, 10-20 for bonds, 5-10 as a safe default.**
            If unsure, start with 5 and compare AIC against df=10,15,20.

    Returns:
        Dict with keys:

        * ``"correlation"`` (*ndarray*) -- Copula correlation matrix R.
        * ``"df"`` (*float*) -- Degrees of freedom used.
        * ``"copula_type"`` (*str*) -- ``"student_t"``.

    Example:
        >>> import numpy as np
        >>> from wraquant.risk.copulas import fit_t_copula, tail_dependence
        >>> rng = np.random.default_rng(42)
        >>> # Simulate fat-tailed correlated returns
        >>> returns = rng.multivariate_normal([0, 0], [[1, 0.5], [0.5, 1]], 1000)
        >>> result = fit_t_copula(returns, df=5)
        >>> print(f"Copula corr: {result['correlation'][0, 1]:.3f}")
        >>> print(f"df={result['df']} → heavier tails than Gaussian")
        >>> # Check tail dependence
        >>> td = tail_dependence(returns)
        >>> print(f"Lower tail dep: {td['lower']:.3f}")

    Notes:
        Reference: Demarta, S. & McNeil, A.J. (2005). "The t Copula
        and Related Copulas." *International Statistical Review*, 73, 111-129.

    See Also:
        fit_gaussian_copula: No tail dependence (Gaussian = t with df=inf).
        fit_clayton_copula: Asymmetric — lower tail dependence only.
        tail_dependence: Empirically estimate lambda_L and lambda_U.
    """
    from wraquant.core._coerce import coerce_array

    if returns.ndim == 1:
        returns = coerce_array(returns, name="returns").reshape(-1, 1)
    else:
        returns = np.asarray(returns, dtype=np.float64)
    n_obs, n_assets = returns.shape
    u = np.column_stack([_empirical_cdf(returns[:, j]) for j in range(n_assets)])
    t_vals = sp_stats.t.ppf(u, df=df)
    t_vals = np.clip(t_vals, -50, 50)

    corr = np.corrcoef(t_vals, rowvar=False)

    return {
        "correlation": corr,
        "df": df,
        "copula_type": "student_t",
    }



# ---------------------------------------------------------------------------
# Archimedean copulas (bivariate)
# ---------------------------------------------------------------------------


def _clayton_cdf(u: np.ndarray, v: np.ndarray, theta: float) -> np.ndarray:
    """Clayton copula CDF: C(u,v) = (u^{-theta} + v^{-theta} - 1)^{-1/theta}."""
    return np.power(
        np.power(u, -theta) + np.power(v, -theta) - 1,
        -1.0 / theta,
    )


def _clayton_log_density(u: np.ndarray, v: np.ndarray, theta: float) -> float:
    """Log-likelihood of the Clayton copula density."""
    n = len(u)
    log_c = (
        n * np.log(1 + theta)
        + (-theta - 1) * (np.sum(np.log(u)) + np.sum(np.log(v)))
        + (-2 - 1.0 / theta)
        * np.sum(np.log(np.power(u, -theta) + np.power(v, -theta) - 1))
    )
    return float(log_c)



[docs]
def fit_clayton_copula(
    u: np.ndarray,
    v: np.ndarray,
) -> dict[str, Any]:
    """Fit a Clayton copula (lower tail dependence) to bivariate data.

    The Clayton copula has lower tail dependence but zero upper tail
    dependence.  This means assets modelled with a Clayton copula
    tend to crash together but rally independently -- a realistic
    pattern for equities and credit, where "risk-on/risk-off" dynamics
    create asymmetric co-movement.

    Interpretation:
        - **theta** > 0 controls dependence strength.  Higher theta
          = stronger lower tail dependence.  theta -> 0 = independence.
        - **lower_tail_dependence** = 2^{-1/theta} is the probability
          that both assets are in their lower tail simultaneously,
          given that one is.  Values above 0.3 indicate meaningful
          crash co-movement.
        - The Clayton copula is the canonical choice for modelling
          "contagion" and "flight to quality" dynamics.

    When to use:
        - Two equity assets or sectors where you expect crash
          contagion but independent rallies.
        - Credit portfolios where defaults cluster.
        - When ``tail_dependence`` shows lower > upper.

    Parameters:
        u: First marginal uniform observations.  Pass raw data and
            the empirical CDF is applied automatically.
        v: Second marginal uniform observations.

    Returns:
        Dict with keys:

        * ``"theta"`` -- Clayton parameter (> 0).
        * ``"copula_type"`` -- ``"clayton"``.
        * ``"lower_tail_dependence"`` -- 2^{-1/theta}. > 0.3 is
          meaningful.

    See Also:
        fit_gumbel_copula: Upper tail dependence (opposite asymmetry).
        fit_t_copula: Symmetric tail dependence.
    """
    from wraquant.core._coerce import coerce_array

    u = coerce_array(u, name="u")
    v = coerce_array(v, name="v")
    u, v = _ensure_uniform(u), _ensure_uniform(v)

    def neg_ll(params: np.ndarray) -> float:
        theta = params[0]
        if theta <= 0:
            return 1e12
        try:
            return -_clayton_log_density(u, v, theta)
        except (FloatingPointError, ValueError):
            return 1e12

    # Kendall's tau initial estimate: theta = 2*tau/(1-tau)
    tau, _ = sp_stats.kendalltau(u, v)
    tau = max(tau, 0.05)
    theta_init = 2 * tau / (1 - tau)
    theta_init = max(theta_init, 0.1)

    res = optimize.minimize(
        neg_ll,
        x0=[theta_init],
        method="Nelder-Mead",
        options={"maxiter": 5000},
    )
    theta = max(float(res.x[0]), 1e-6)

    return {
        "theta": theta,
        "copula_type": "clayton",
        "lower_tail_dependence": float(2 ** (-1.0 / theta)),
    }



def _gumbel_log_density(u: np.ndarray, v: np.ndarray, theta: float) -> float:
    """Log-likelihood of the Gumbel copula density."""
    lu = -np.log(u)
    lv = -np.log(v)
    A = np.power(lu, theta) + np.power(lv, theta)
    A_inv = np.power(A, 1.0 / theta)

    # log C(u,v)
    log_C = -A_inv

    # log c(u,v) = log C + log[ ... ]
    # Full density:
    # c(u,v) = C(u,v) * (1/(u*v)) * (lu*lv)^(theta-1) / A^(2-1/theta)
    #          * (A_inv + theta - 1)
    log_density = (
        log_C
        - np.log(u)
        - np.log(v)
        + (theta - 1) * (np.log(lu) + np.log(lv))
        - (2 - 1.0 / theta) * np.log(A)
        + np.log(A_inv + theta - 1)
    )
    return float(np.sum(log_density))



[docs]
def fit_gumbel_copula(
    u: np.ndarray,
    v: np.ndarray,
) -> dict[str, Any]:
    """Fit a Gumbel copula (upper tail dependence) to bivariate data.

    The Gumbel copula has upper tail dependence but zero lower tail
    dependence.  Assets rally together in extreme moves but crash
    independently.  This is less common than the Clayton pattern in
    equities but can apply to certain commodity pairs (e.g., two
    correlated energy commodities that spike together during supply
    shocks).

    Interpretation:
        - **theta** >= 1 controls dependence strength. theta = 1 is
          independence; larger theta = stronger upper tail dependence.
        - **upper_tail_dependence** = 2 - 2^{1/theta} measures the
          probability of joint extreme upper co-movement.

    When to use:
        - Commodity pairs with supply-shock co-movement.
        - When ``tail_dependence`` shows upper > lower.
        - When modelling "melt-up" contagion.

    Parameters:
        u: First marginal uniform observations.
        v: Second marginal uniform observations.

    Returns:
        Dict with keys:

        * ``"theta"`` -- Gumbel parameter (>= 1).
        * ``"copula_type"`` -- ``"gumbel"``.
        * ``"upper_tail_dependence"`` -- 2 - 2^{1/theta}.

    See Also:
        fit_clayton_copula: Lower tail dependence (more common in equities).
        fit_t_copula: Symmetric tail dependence.
    """
    from wraquant.core._coerce import coerce_array

    u = coerce_array(u, name="u")
    v = coerce_array(v, name="v")
    u, v = _ensure_uniform(u), _ensure_uniform(v)

    def neg_ll(params: np.ndarray) -> float:
        theta = params[0]
        if theta < 1.0:
            return 1e12
        try:
            return -_gumbel_log_density(u, v, theta)
        except (FloatingPointError, ValueError):
            return 1e12

    tau, _ = sp_stats.kendalltau(u, v)
    tau = max(tau, 0.05)
    theta_init = 1.0 / (1 - tau)
    theta_init = max(theta_init, 1.01)

    res = optimize.minimize(
        neg_ll,
        x0=[theta_init],
        method="Nelder-Mead",
        options={"maxiter": 5000},
    )
    theta = max(float(res.x[0]), 1.0)

    return {
        "theta": theta,
        "copula_type": "gumbel",
        "upper_tail_dependence": float(2 - 2 ** (1.0 / theta)),
    }



def _frank_log_density(u: np.ndarray, v: np.ndarray, theta: float) -> float:
    """Log-likelihood of the Frank copula density."""
    # c(u,v) = theta * (1 - exp(-theta)) * exp(-theta*(u+v))
    #          / (  (1-exp(-theta)) - (1-exp(-theta*u))*(1-exp(-theta*v))  )^2
    e_t = np.exp(-theta)
    e_tu = np.exp(-theta * u)
    e_tv = np.exp(-theta * v)

    numer = theta * (1 - e_t) * np.exp(-theta * (u + v))
    denom = ((1 - e_t) - (1 - e_tu) * (1 - e_tv)) ** 2

    # Guard against non-positive values
    mask = (numer > 0) & (denom > 0)
    if not np.all(mask):
        return -1e12
    return float(np.sum(np.log(numer[mask]) - np.log(denom[mask])))



[docs]
def fit_frank_copula(
    u: np.ndarray,
    v: np.ndarray,
) -> dict[str, Any]:
    """Fit a Frank copula (symmetric dependence, no tail dependence).

    The Frank copula is symmetric with zero tail dependence in both
    directions.  It is useful as a benchmark: if neither the Clayton
    nor Gumbel copula fits significantly better than Frank, then
    there is no evidence of asymmetric tail dependence.

    Interpretation:
        - **theta** > 0: positive dependence; theta < 0: negative
          dependence; |theta| large = strong dependence.
        - The Frank copula allows negative dependence (unlike Clayton
          and Gumbel), making it useful for hedging pairs.
        - No tail dependence: extreme co-movements are modelled as
          asymptotically independent.

    When to use:
        - As a benchmark against Clayton/Gumbel.
        - When ``tail_dependence`` shows negligible tail dependence.
        - For pairs with negative dependence (hedging relationships).

    Parameters:
        u: First marginal uniform observations.
        v: Second marginal uniform observations.

    Returns:
        Dict with keys:

        * ``"theta"`` -- Frank parameter. Positive = positive
          dependence, negative = negative dependence.
        * ``"copula_type"`` -- ``"frank"``.

    See Also:
        fit_gaussian_copula: Multivariate with no tail dependence.
        fit_clayton_copula: Lower tail dependence.
        fit_gumbel_copula: Upper tail dependence.
    """
    from wraquant.core._coerce import coerce_array

    u = coerce_array(u, name="u")
    v = coerce_array(v, name="v")
    u, v = _ensure_uniform(u), _ensure_uniform(v)

    def neg_ll(params: np.ndarray) -> float:
        theta = params[0]
        if abs(theta) < 1e-6:
            return 1e12
        try:
            return -_frank_log_density(u, v, theta)
        except (FloatingPointError, ValueError):
            return 1e12

    tau, _ = sp_stats.kendalltau(u, v)
    # Rough initial guess
    theta_init = np.sign(tau) * max(abs(tau) * 10, 1.0)
    if abs(theta_init) < 0.5:
        theta_init = 1.0

    res = optimize.minimize(
        neg_ll,
        x0=[theta_init],
        method="Nelder-Mead",
        options={"maxiter": 5000},
    )
    theta = float(res.x[0])

    return {
        "theta": theta,
        "copula_type": "frank",
    }



# ---------------------------------------------------------------------------
# Simulation
# ---------------------------------------------------------------------------



[docs]
def copula_simulate(
    copula_params: dict[str, Any],
    n_sims: int = 10000,
    copula_type: str | None = None,
    seed: int | None = None,
) -> np.ndarray:
    """Simulate from a fitted copula.

    Generates samples from the fitted dependence structure with uniform
    marginals.  To get realistic return scenarios, transform each
    column through the inverse CDF of the desired marginal distribution
    (e.g., inverse normal, inverse t, or empirical quantile function).

    Interpretation:
        - Output columns are uniform on (0, 1) -- they represent the
          dependence structure only, not the marginal distributions.
        - Correlated low values in all columns = a joint crash scenario.
        - To convert to returns: ``returns[:, j] = norm.ppf(U[:, j],
          loc=mu_j, scale=sigma_j)`` or use empirical quantile
          functions for non-parametric marginals.
        - Use for Monte Carlo VaR/CVaR, portfolio simulation, or
          stress testing.

    Parameters:
        copula_params: Output from one of the ``fit_*_copula`` functions.
        n_sims: Number of samples to draw.
        copula_type: Override copula type (defaults to
            ``copula_params["copula_type"]``).
        seed: Random seed for reproducibility.

    Returns:
        Array of shape ``(n_sims, d)`` with uniform marginals in (0, 1).

    Raises:
        ValueError: If the copula type is not recognized.

    Example:
        >>> from scipy.stats import norm
        >>> cop = fit_gaussian_copula(returns)
        >>> U = copula_simulate(cop, n_sims=10000, seed=42)
        >>> # Transform to normal marginals:
        >>> sim_returns = norm.ppf(U, loc=mu, scale=sigma)
    """
    rng = np.random.default_rng(seed)
    ctype = copula_type or copula_params.get("copula_type", "")

    if ctype == "gaussian":
        corr = copula_params["correlation"]
        d = corr.shape[0]
        z = rng.multivariate_normal(np.zeros(d), corr, size=n_sims)
        return sp_stats.norm.cdf(z)

    if ctype == "student_t":
        corr = copula_params["correlation"]
        df = copula_params["df"]
        d = corr.shape[0]
        z = rng.multivariate_normal(np.zeros(d), corr, size=n_sims)
        chi2 = rng.chisquare(df, size=(n_sims, 1))
        t_vals = z / np.sqrt(chi2 / df)
        return sp_stats.t.cdf(t_vals, df=df)

    if ctype == "clayton":
        theta = copula_params["theta"]
        # Marshall-Olkin algorithm for Clayton
        v_gamma = rng.gamma(1.0 / theta, 1.0, size=n_sims)
        e1 = rng.exponential(1.0, size=n_sims)
        e2 = rng.exponential(1.0, size=n_sims)
        u1 = np.power(1 + e1 / v_gamma, -1.0 / theta)
        u2 = np.power(1 + e2 / v_gamma, -1.0 / theta)
        return np.column_stack([u1, u2])

    if ctype == "gumbel":
        theta = copula_params["theta"]
        # Stable distribution method for Gumbel
        # Use inverse CDF sampling via conditional
        u1 = rng.uniform(0, 1, size=n_sims)
        u2 = rng.uniform(0, 1, size=n_sims)
        # Use conditional CDF inversion approximation
        lu1 = -np.log(u1)
        lu2 = -np.log(u2)
        A = np.power(lu1, theta) + np.power(lu2, theta)
        C = np.exp(-np.power(A, 1.0 / theta))
        # Apply PIT
        return np.column_stack([u1, C / u1])

    if ctype == "frank":
        theta = copula_params["theta"]
        u1 = rng.uniform(0, 1, size=n_sims)
        # Conditional CDF inversion for Frank
        # C_2|1(u2|u1) = t => u2 = -1/theta * log(1 + t*(exp(-theta)-1) / (exp(-theta*u1) * (1 - t) + t))
        # where t is uniform
        t = rng.uniform(0, 1, size=n_sims)
        exp_neg_theta = np.exp(-theta)
        exp_neg_theta_u1 = np.exp(-theta * u1)
        numer = t * (exp_neg_theta - 1)
        denom = exp_neg_theta_u1 * (1 - t) + t
        u2 = -np.log(1 + numer / denom) / theta
        u2 = np.clip(u2, 1e-10, 1 - 1e-10)
        return np.column_stack([u1, u2])

    msg = f"Unknown copula type: {ctype!r}"
    raise ValueError(msg)



# ---------------------------------------------------------------------------
# Tail dependence
# ---------------------------------------------------------------------------



[docs]
def tail_dependence(
    u: np.ndarray,
    v: np.ndarray,
    method: str = "empirical",
    threshold: float = 0.05,
) -> dict[str, float]:
    """Estimate lower and upper tail dependence coefficients.

    Tail dependence measures the probability that one variable is in
    its extreme tail given that the other is.  This is the key quantity
    that copula selection hinges on.

    Interpretation:
        - **lower** ~ P(V <= q | U <= q): the probability of a joint
          crash.  Values above 0.2-0.3 are economically significant
          and indicate that the Gaussian copula is inappropriate.
        - **upper** ~ P(V > 1-q | U > 1-q): the probability of a
          joint rally.
        - If lower >> upper: use Clayton copula (crash contagion).
        - If upper >> lower: use Gumbel copula (rally contagion).
        - If both are similar: use Student-t copula (symmetric tails).
        - If both are near zero: Gaussian or Frank copula is adequate.

    Caveat:
        - Empirical tail dependence estimates are noisy with small
          samples.  Use threshold = 0.10 for more observations per
          tail (less extreme, more precise) or 0.05 for fewer
          observations (more extreme, noisier).

    Parameters:
        u: First marginal uniform observations.
        v: Second marginal uniform observations.
        method: ``"empirical"`` (default) for non-parametric estimation.
        threshold: Quantile threshold for tail estimation.
            0.05 = 5th/95th percentile (default). 0.10 = more data
            in the tail estimate but less extreme.

    Returns:
        Dict with ``"lower"`` and ``"upper"`` tail dependence estimates.

    Example:
        >>> import numpy as np
        >>> from scipy.stats import norm
        >>> rng = np.random.default_rng(0)
        >>> # Simulate from a t-copula (symmetric tail dependence)
        >>> z = rng.multivariate_normal([0,0], [[1,0.5],[0.5,1]], 5000)
        >>> u = norm.cdf(z[:, 0])
        >>> v = norm.cdf(z[:, 1])
        >>> td = tail_dependence(u, v)
        >>> print(f"Lower: {td['lower']:.2f}, Upper: {td['upper']:.2f}")
    """
    from wraquant.core._coerce import coerce_array

    u = coerce_array(u, name="u")
    v = coerce_array(v, name="v")
    u, v = _ensure_uniform(u), _ensure_uniform(v)

    if method == "empirical":
        # Lower tail: P(V <= q | U <= q)
        lower_mask = u <= threshold
        if np.sum(lower_mask) > 0:
            lower_td = float(np.mean(v[lower_mask] <= threshold))
        else:
            lower_td = 0.0

        # Upper tail: P(V > 1-q | U > 1-q)
        upper_q = 1 - threshold
        upper_mask = u >= upper_q
        if np.sum(upper_mask) > 0:
            upper_td = float(np.mean(v[upper_mask] >= upper_q))
        else:
            upper_td = 0.0

        return {"lower": lower_td, "upper": upper_td}

    msg = f"Unknown tail dependence method: {method!r}"
    raise ValueError(msg)



# ---------------------------------------------------------------------------
# Rank correlation
# ---------------------------------------------------------------------------



[docs]
def rank_correlation(
    returns: np.ndarray,
    method: str = "both",
) -> dict[str, Any]:
    """Compute Kendall's tau and/or Spearman's rho rank correlation.

    Rank correlations measure monotonic association without assuming
    linearity.  Unlike Pearson correlation, they are invariant to
    monotonic transformations of the marginals and directly related
    to copula parameters, making them the correct correlation measure
    for copula modelling.

    Interpretation:
        - **Kendall's tau**: Probability of concordance minus
          probability of discordance.  tau = 0.5 means ~75% of pairs
          move in the same direction.
        - **Spearman's rho**: Pearson correlation of the ranks.
          Generally |rho| >= |tau| for the same data.
        - Both are robust to outliers (unlike Pearson).
        - Copula relationships: for the Gaussian copula,
          rho_pearson = 2*sin(pi*tau/6). For Clayton,
          theta = 2*tau/(1-tau).

    When to use:
        - Always prefer rank correlation over Pearson for copula
          modelling.
        - Use Kendall's tau for small samples (more robust).
        - Use Spearman's rho for comparison with Pearson.

    Parameters:
        returns: Array of shape ``(n_obs, n_assets)``.
        method: ``"kendall"``, ``"spearman"``, or ``"both"`` (default).

    Returns:
        Dict with keys (depending on *method*):

        * ``"kendall_tau"`` -- Kendall's tau correlation matrix.
        * ``"spearman_rho"`` -- Spearman's rho correlation matrix.

    Raises:
        ValueError: If *method* is not recognized.
    """
    from wraquant.core._coerce import coerce_array

    if returns.ndim == 1:
        returns = coerce_array(returns, name="returns").reshape(-1, 1)
    else:
        returns = np.asarray(returns, dtype=np.float64)
    n_assets = returns.shape[1]

    result: dict[str, Any] = {}

    if method in ("kendall", "both"):
        tau_mat = np.eye(n_assets)
        for i in range(n_assets):
            for j in range(i + 1, n_assets):
                tau, _ = sp_stats.kendalltau(returns[:, i], returns[:, j])
                tau_mat[i, j] = tau
                tau_mat[j, i] = tau
        result["kendall_tau"] = tau_mat

    if method in ("spearman", "both"):
        rho_mat = np.eye(n_assets)
        for i in range(n_assets):
            for j in range(i + 1, n_assets):
                rho, _ = sp_stats.spearmanr(returns[:, i], returns[:, j])
                rho_mat[i, j] = rho
                rho_mat[j, i] = rho
        result["spearman_rho"] = rho_mat

    if not result:
        msg = f"Unknown rank correlation method: {method!r}"
        raise ValueError(msg)

    return result



# ---------------------------------------------------------------------------
# Internal helpers
# ---------------------------------------------------------------------------


def _ensure_uniform(x: np.ndarray) -> np.ndarray:
    """If values appear to be raw data (outside [0,1]), convert to pseudo-obs."""
    if np.min(x) < 0 or np.max(x) > 1:
        return _empirical_cdf(x)
    return x