Copula Primer

John Dodson, Minnesota Center for Financial and Actuarial Mathematics, 2023-02-13

Expectation

Motivation: extracting a real value from a random variable usually involves evaluating an expectation.

definition: $\boxed{\operatorname E[X]=\int_\mathbb R x\,f_X(x)\,dx}$

Let $(X,Y)$ be a random variable in $\mathbb R^2$. The components are real random variables with marginal densities $x\mapsto f_X(x)$ and $y\mapsto f_Y(y)$ and joint density $(x,y)\mapsto f_{(X,Y)}(x,y)$.

Note that the marginal densities are related to the joint density through $\begin{align}x\mapsto f_X(x)&=\int_\mathbb R f_{(X,Y)}(x,y)\,dy\\ y\mapsto f_Y(y)&=\int_\mathbb R f_{(X,Y)}(x,y)\,dx\end{align}$

so we also have $\operatorname E[X]=\int_{\mathbb R^2}x\,f_{(X,Y)}(x,y)\,dx\,dy$

Applications of expectation

Expectation of a function of a random variable

Algebra with random variables is hard. For an arbitrary function $h$, $h(X)$ is also a random variable. But it is typically difficult or impossible to determine a formula of the density of $h(X)$, even if you know $f_X(x)$, unless $h$ is of a limited number of special forms and $X$ is from a well-known family.

In contrast, evaluating the expected value of a transformed random variable is typically relatively easy. You just need to do an integral, which is usually easy enough to accomplish using calculus (manual or computer-assisted), numerical quadrature, or monte carlo.

$$\operatorname E[h(X)]=\int_\mathbb Rh(x)\,f_X(x)\,dx$$

Variance of a random variable

$$\operatorname{var}[X]=\operatorname E[X^2]-\operatorname E[X]^2$$

This is straight-forward to evaluate with $h(x)=x^2$ above.

Note that not all interesting real random variables have finite variances.

Laplace transform

Working with the Laplace transform of a random variable, $t\mapsto \phi_X(t)=\operatorname E\left[e^{-t\,X}\right]$, can be useful in certain circumstances.

For example, in evaluating moments:

$$\begin{align}\operatorname E[X]&=-\phi'_X(0)\\ \operatorname{var}[X]&=\phi''_X(0)-\phi'_X(0)^2\end{align}$$

Another circumstance is the analysis of a sum of independent copies of a random variable $X$. For example, consider the mean of a sample

$$\bar X=\operatorname{mean}(X_1,X_2,\ldots,X_n)=\frac{1}{n}\left(X_1+X_2+\cdots+X_n\right)$$

The Laplace transform of the sample mean is

$$\phi_{\bar X}(t)=\operatorname E\left[e^{-\frac{t}{n}\left(X_1+X_2+\cdots+X_n\right)}\right]=\phi_X\left(\frac{t}{n}\right)^n$$ by independence.

Say you can approximate by Taylor's expansion $\phi_X(\epsilon)\approx \phi_X(0)+\epsilon\,\phi'_X(0)=1-\epsilon\operatorname E[X]$. Then

$$\phi_{\bar X}(t)\approx\left(1-\frac{t\operatorname E[X]}{n}\right)^n\approx e^{-t\operatorname E[X]}$$

for large sample size $n$.

Do we recognize this Laplace transform? When a random variable is effectively not random we call it degenerate. The Laplace transform of a degenerate random variable whose value is almost surely $m$ is $t\mapsto e^{-t\,m}$. We have outlined a demonstration of the (weak) law of large numbers:

$\lim_{n\to\infty}\operatorname{mean}\left(X_1,X_2,\ldots,X_n\right)\overset{*}{=}\operatorname E[X]$. That is, the mean estimator degenerates to the expected value in the sample size limit. The asterisk indicates that the convergence is "in probability".

Probability mass

$\operatorname P\left\{X\in\mathscr X\right\}=\operatorname E\left[I_{\mathscr X}(X)\right]$ where $\mathscr X\subset\mathbb R$ is a set of reals and $x\mapsto I_{\mathscr X}(x)$ denotes the corresponding indicator function for set membership.

The indicator function is a mechanism by which the limits of integration can be specified within the integrand.

Conditional expectation

$$\operatorname E[X|Y=y_0]=\frac{\int_\mathbb R x\,f_{(X,Y)}(x,y_0)\,dx}{\int_\mathbb R f_{(X,Y)}(x',y_0)\,dx'}$$

Note that this is a special case of change of measure. $X|Y=y_0$ can be thought of as a new random variable with density $x\mapsto f_{X|Y=y_0}(x)=\frac{f_{(X,Y)}(x,y_0)}{\int_\mathbb Rf_{(X,Y)}(x',y_0)\,dx'}$.

Copula

The density of a joint random variable, such as $(X,Y)$, contains the descriptions of the marginal densities of each univariate component. The copula is what remains if we factor out these marginal densities. It describes the dependence between the components.

Remember, probability density is not an ordinary function. It only lives inside of an integral. Changing variables inside of an integral involves a pull-back. This requirement translates to the following definition:

$(x,y)\mapsto f_{(X,Y)}(x,y)=c(F(x),F(y))F'_X(x)F'_Y(y) $ where $F_X$ and $F_Y$ are the marginal density functions

Per Sklar's Theorem, this defines implicitly a new random variable $(U,V)$ on $[0,1]^2$ whose density is $(u,v)\mapsto c(u,v)$. This new random variable is the copula.

You can extract the (e.g. bivariate) copula from any joint random variable by applying the following recipe:

$$\boxed{(u,v)\mapsto c(u,v)=\frac{f_{(X,Y)}(F_X^{-1}(u),F_Y^{-1}(v))}{F'_X\circ F_X^{-1}(u)\,F'_Y\circ F_Y^{-1}(v)}}$$

As with any bivariate random variable, the copula distribution function is

$$(u,v)\mapsto C(u,v)=\operatorname P\left\{U\le u\land V\le v\right\}=\int_0^v\int_0^uc(u',v')\,du'dv'$$

and the converse is, of course,

$$c(u,v)=\left.\frac{\partial^2C}{\partial u\partial v}\right|_{(u,v)}$$

Note that $C(1,v)=v$, $C(u,1)=u$, and $C(0,v)=C(u,0)=0$.

Flipped copula

The flipped copula, aka survival copula, is the copula with the arguments running in reverse.

For the copula distribution,

$$(u,v)\mapsto \hat C(u,v)=\operatorname P\left\{U>1-u\land V>1-v\right\}$$

We can evaluate this by carefully integrating the copula density.

$$\begin{align} \hat C(u,v)&=\int_{1-v}^1\int_{1-u}^1\left.\frac{\partial^2 C}{\partial u\partial v}\right|_{(u',v')}du'dv'\\ &=\int_{1-v}^1\left(\left.\frac{\partial C}{\partial v}\right|_{(1,v')}-\left.\frac{\partial C}{\partial v}\right|_{(1-u,v')}\right)dv'\\ &=\int_{1-v}^1\left(1-\left.\frac{\partial C}{\partial v}\right|_{(1-u,v')}\right)dv'\\ &=v-C(1-u,1)+C(1-u,1-v) \end{align}$$

but $C(1-u,1)=1-u$, so

flipped copula distribution: $\boxed{(u,v)\mapsto \hat C(u,v)=u+v-1+C(1-u,1-v)}$

The density of the flipped copula is a little more transparent:

$$\begin{align} (u,v)\mapsto \hat c(u,v)&=\left.\frac{\partial^2\hat C}{\partial u\partial v}\right|_{(u,v)}\\ &=\left.\frac{\partial^2}{\partial u\partial v}\left(u+v-1+C(1-u,1-v)\right)\right|_{(u,v)}\\ &=\left.\frac{\partial^2 C}{\partial u\partial v}\right|_{(1-u,1-v)}\\ &=c(1-u,1-v) \end{align}$$

A copula for which $C=\hat C$ or equivalently $\hat c=c$ is said to be axisymmetric.

Conditional copula

The conditional copula is $u\mapsto C(u|v_0)=\operatorname P\left\{U\le u\,|\,V=v_0\right\}$

Evaluating this requires a little delicacy, because the conditioning event is technically measure-zero. So let's temporarily introduce an infinitesimal and say that the conditioning event is actually $v_0 < V\le v_0+dv$ for some small positive $dv$, which we will later limit to zero.

$$\begin{align} C(u|v_0)&=\frac{\operatorname P\left\{U\le u\land v_0< V\le v_0+dv\right\}}{\operatorname P\left\{v_0< V\le v_0+dv\right\}}\\ &=\frac{\operatorname P\left\{U\le u\land V\le v_0+dv\right\}-\operatorname P\left\{U\le u\land V\le v_0\right\}}{\operatorname P\left\{V\le v_0+dv\right\}-\operatorname P\left\{V\le v_0\right\}}\\ &=\frac{C(u,v_0+dv)-C(u,v_0)}{dv} \end{align}$$

Taking the limit yields the derivative, of course.

conditional copula distribution: $\boxed{u\mapsto C(u|v_0)=\left.\frac{\partial C}{\partial v}\right|_{(u,v_0)}}$

Basic copulas

Independence copula

For independent random variables, probabability masses are multiplicative: $\operatorname P\left\{X\in\mathscr X\land Y\in\mathscr Y\right\}=\operatorname P\left\{X\in\mathscr X\right\}\operatorname P\left\{Y\in\mathscr Y\right\}$. Since this is true for arbitrary sets in the sample space, in particular this means that the joint density factors: $(x,y)\mapsto f_{(X,Y)}(x,y)=f_X(x)f_Y(y)$; hence,

independent copula density: $(u,v)\mapsto c(u,v)=1$. The distribution function is $(u,v)\mapsto C(u,v)=uv$.

Comonotone copula

If $X$ and $Y$ are random variables, but in fact $Y=h(X)$ where $h$ is invertible, then the conditional random variables $Y|X$ and $X|Y$ are degenerate. The copula in this case can be represented by a Dirac delta:

comonotone copula density: $(u,v)\mapsto c(u,v)=\delta(v-u)$. The distribution function is $(u,v)\mapsto C(u,v)=\min(u,v)$.

Gaussian copula

The standard normal density is $x\mapsto f_X(x)=\frac{1}{\sqrt{2\pi}}e^{-\tfrac12 x^2}$ with distribution function $x\mapsto F_X(x)=\Phi(x)=\tfrac12+\tfrac12\operatorname{erf}\left(\frac{x}{\sqrt 2}\right)$ (The standard normal distribution function appears so frequently in mathematical probability that it has earned its own symbol, $\Phi$. $\operatorname{erf}$ is a closely-related special function available in scientific computing environments.)

If $X$ and $Z$ are independent standard normal random variables, their joint density is just the product: $f_{(X,Z)}(x,z)=\frac{1}{2\pi}e^{-\tfrac12\left(x^2+z^2\right)}$.

Introduce a new random variable $Y=\sqrt{1-\rho^2}Z+\rho X$. This is a linear combination of normals, so it is normal. It is designed so that $\operatorname E[Y]=0$ and $\operatorname E[Y^2]=1$, so in fact it is standard normal. Furthermore, $\operatorname E[XY]=\rho$, so $\rho$ can be interpreted as the linear (Pearson's) correlation between $X$ and $Y$.

Pull back the inverse transformation $Z=\frac{Y-\rho X}{\sqrt{1-\rho^2}}$ through the joint density to get the

bivariate normal density: $(x,y)\mapsto f_{(X,Y)}(x,y)=\frac{1}{2\pi\sqrt{1-\rho^2}}e^{-\tfrac12\frac{x^2-2\rho x y+y^2}{1-\rho^2}}$

Substitute in the normal inverse distribution function to get an expression for the (bivariate)

gaussian copula density: $\boxed{(u,v)\mapsto c(u,v)=\frac{1}{\sqrt{1-\rho^2}}e^{-\tfrac12\frac{\rho^2\Phi^{-1}(u)^2-2\rho\Phi^{-1}(u)\Phi^{-1}(v)+\rho^2\Phi^{-1}(v)^2}{1-\rho^2}}}$

for $-1<\rho<1$.

Note that the gaussian copula for $\rho=0$ is the independence copula and for $\rho=1$ is the comonotone copula.

Since $\Phi^{-1}(1-q)=-\Phi^{-1}(q)$ for $0 < q < 1$ we see that the gaussian copula is axisymmetric.

Dependence metrics

Linear correlation as introduced above can be generalized to any pair of nondegenerate random variables (with finite variances):

$$\rho_\text{Pearson}^{(X,Y)}=\frac{\operatorname E[XY]-\operatorname E[X]\operatorname E[Y]}{\sqrt{\operatorname{var}[X]\operatorname{var}[Y]}}$$

But this recipe depends on both the margins and the copula of the joint random variable. We would prefer a dependence metric that only depends on the copula. Luckily, such metrics exist, e.g. Kendall's correlation and Spearman's correlation.

Spearman's correlation

Another name for the distribution function is the grade function, particularly where it is used to transform the random variable itself. Spearman's correlation is simply the linear correlation of the marginal grades.

$$\rho_\text{Spearman}^{(X,Y)}=\rho_\text{Pearson}^{(F_X(X),F_Y(Y))}$$

In terms of the copula, it can be proved that

$$\boxed{\rho_\text{Spearman}^{(X,Y)}=12\int_0^1\int_0^1\left(C(u,v)-u\,v\right)\,du\,dv}$$

where $C$ is the copula distribution function.

If $(X,Y)$ are bivariate normal, it turn out that $$\rho_\text{Spearman}^{(X,Y)}=\frac{6}{\pi}\sin^{-1}\tfrac12\rho\approx\frac{3}{\pi}\rho$$

Kendall's correlation

Kendall's correlation (often denote by $\tau$) is the expected value of the relative ranks of independent pairs. Let $(\tilde X,\tilde Y)$ be an independent copy of $(X,Y)$. That is, the copula between $\tilde X$ and $\tilde Y$ is the same as the copula between $X$ and $Y$, but $X$ is independent of $\tilde X$ etc.

$$\rho_\text{Kendall}^{(X,Y)}=\operatorname E\left[\operatorname{sgn}(X-\tilde X)\,\operatorname{sgn}(Y-\tilde Y)\right]$$

It is the covariance of the concordances, where $\operatorname{sgn}$ is the sign of its argument: $+1$, $-1$ or $0$.

Note that in working with data, Kendall's correlation has an advantage over Spearman's correlation because the standard rank order correlation estimator does not rely on any assumptions about the distribution of the margins, only that the samples are independent and identical random variables (iid).

In terms of the copula, it can be proved that

$$\boxed{\rho_\text{Kendall}^{(X,Y)}=4\operatorname E[C(U,V)]-1}$$

If $(X,Y)$ are bivariate normal, it turns out that

$$\rho_\text{Kendall}^{(X,Y)}=\frac{2}{\pi}\sin^{-1}\rho\approx \frac{2}{\pi}\rho$$

Tail dependence

Informally, tail dependence describes the extent to which if an extreme outcome is realized in one component of a bivariate random variable another component will also be extreme. The limiting case, where the tail dependence is zero, is termed asymptotic independence.

It turns out that the gaussian copula has asymptotic independence for all pairs of components. This has been considered problematic in finance applications; and much of the industry's interest in copula models stems from developing alternate multivariate models that are not necessarily asymptotically independent.

One successful approach is to introduce a common frailty factor, particularly in the context of elliptical copulas which generalize the gaussian copula. Another approach is to construct copulas from Laplace transforms of common random variables, which is the basis of Archimedian copulas.

Formally, lower and upper tail dependence are defined for a pair of real random variables $(X,Y)$ as

$$\begin{align} \lambda_\text L&=\lim_{q\downarrow 0}\operatorname P\left\{Y\le F^{-1}_Y(q)|X\le F^{-1}_X(q)\right\}\\ \lambda_\text U&=\lim_{q\uparrow 1}\operatorname P\left\{Y>F^{-1}_Y(q)|X>F^{-1}_X(q)\right\} \end{align}$$

that is, the asymptotic probability of a quantile exceedance in one component causing a quantile exceedance in the other component.

Let's demonstrate that, even though the definition references the margins, this is strictly a copula metric. Let's start with the lower tail. Say the copula density is $C$.

$$\begin{align} \lambda_\text L &=\lim_{q\downarrow 0}\frac{\operatorname P\left\{X\le F^{-1}_X(q)\land Y\le F^{-1}_Y(q)\right\}}{\operatorname P\left\{Y\le F^{-1}_Y(q)\right\}}\\ &=\lim_{q\downarrow 0}\frac{\operatorname P\left\{F_X(X)\le q\land F_Y(Y)\le q\right\}}{\operatorname P\left\{F_Y(Y)\le q\right\}}\\ &=\lim_{q\downarrow 0}\frac{C(q,q)}{q} \end{align}$$

Evaluate this for the flipped copula to get the upper tail,

$$\lambda_\text U=2-\lim_{q\uparrow 1}\frac{1-C(q,q)}{1-q}$$

L'Hôpital's rule is typically employed to evaluate these limits.

Archimedian copulas

Gaussian copulas, and elliptic copulas more generally, are useful when your model has a complex dependence structure near to the mean, but simpler behavior far away from the mean. Archimedian copulas are useful in the complementary situation, when the dependence near the mean is relatively simple but the dependence at the extremes has more structure. Think credit portfolios or aggregate insurance liabilities in a finance context.

Archimedian copulas are defined in terms of a generator function, $\psi:[0,\infty)\mapsto[0,1]$, continuous and strictly decreasing, with $\psi(0)=1$ and $\lim_{t\to\infty}\psi(t)=0$. The (bivariate) copula distribution function is

Archimedian copula: $\boxed{(u,v)\mapsto C(u,v)=\psi\left(\psi^{-1}(u)+\psi^{-1}(v)\right)}$

Note that the independence copula fits into this class, with generator $\psi(t)=e^{-t}$ (the Laplace transform of the degenerate random variable whose value is almost surely $1$), but the general guassian copula does not.

Generators typically have a parameter whose value can be calibrated to represent the strength of the dependence. The table below has some popular examples.

$$\begin{array}{| c | c | c | c | c |} \hline &\text{Generator }\psi(t)&\text{correl }\rho_\text{Kendall}&\text{lower tail }\lambda_\text L&\text{upper tail }\lambda_\text U\\ \hline \text{Gumbel}&e^{-t^{1/\theta}}&1-1/\theta&0&2-2^{1/\theta}\\ \text{Clayton}&\max\left\{0,(1+\theta t)^{-1/\theta}\right\}&\frac{1}{1+2/\theta}&\begin{cases}2^{-1/\theta}&\theta>0\\0&\theta\le 0\end{cases}&0\\ \text{Frank}&-\frac{1}{\theta}\log\left(1+\left(e^{-\theta}-1\right)e^{-t}\right)&1-\frac{4}{\theta}\left(1-D_1(\theta)\right)&0&0\\ \hline \end{array}$$

where $D_1(\theta)=\frac{1}{\theta}\int_0^\theta\frac{x}{e^x-1}dx$ is a Debye function.

Note that the (bivariate) Frank copula is axisymmetric.

Elliptical copulas

Any real random variable $Z$ whose density $z\mapsto f_Z(z)$ involves the argument only through its square can be the basis for an elliptical copula.

What is special about this form is that one can derive the joint density for a similar random variable in $\mathbb R^N$ by simply replacing $z^2$ by the quadratic form $\mathbf{x}^{\operatorname T}\Sigma^{-1}\mathbf{x}$ ($\Sigma$ is a positive-definite, symmetric matrix with unit diagonals) and renormalizing. This introduces $N \choose 2$ new parameters, one for each pair of components, that can be used to specify pair-wise dependence.

In implementing this dimension lifting procedure, we cannot assume that the margins of the new random variable in $\mathbb R^N$ necessarily have the same density as $Z$. Often the values of the parameters are altered (as we see in the example).

Student's-$t$ copula

One popular example of a random variable subject to this procedure is the standard

Student's-$t$: $z\mapsto f_Z(z)=\frac{1}{\operatorname B\left(\frac{\nu}{2},\frac12\right)\sqrt{\nu-2}}\left(1+\frac{z^2}{\nu-2}\right)^{-\left(\frac{\nu}{2}+\frac12\right)}$ (unit variance version)

with parameter $\nu>2$, where $\operatorname B(a,b)=\frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)}$ is the Beta function.

Replacing $z^2$ by a quadratic form in $\mathbb R^2$, we get a bivariate joint density,

lifted bivariate Student's-$t$ density: $(x,y)\mapsto f_{(X,Y)}(x,y)=\frac{1}{2\pi}\frac{1}{\sqrt{1-\rho^2}}\frac{\nu-1}{\nu-2}\left(1+\frac{x^2-2\rho xy+y^2}{(\nu-2)(1-\rho^2)}\right)^{-\left(\frac{\nu}{2}+\frac12\right)}$

We can integrate out all but one of the components to get the margin.

marginal elliptical Student's-$t$ density: $x\mapsto f_X(x)=\frac{1}{\operatorname B\left(\frac{\nu-1}{2},\frac12\right)\sqrt{\nu-2}}\left(1+\frac{x^2}{\nu-2}\right)^{-\frac{\nu}{2}}$

with the same result for $f_Y$.

Notice that the margins are actually in the Student's-$t_{\nu-1}$ family. This could lead to ambiguity in the interpretation of the $\nu$ parameter (historically called the degrees of freedom) in the copula setting. The convention is to specify the parameter values(s) in the context of the margins. So the bivariate joint is actually lifted from the Student's-$t_{\nu+1}$ family.

Note that the Student's-$t$ copula is an example of an incomplete copula: one cannot recover the independence copula through a choice of parameters. Consequently, the Student's-$t$ copula is probably not appropriate for modeling components whose dependences are small.

Example: Student's-$t_4$ copula

A relatively tractable case is the copula of the multivariate Student's-$t$ that has margins in the $\nu=4$ family (i.e. the bivariate joint lifted from $\nu=5$).

The marginal distribution function in this case is $x\mapsto F_X(x)=\int_{-\infty}^x \frac{\sqrt 3}{4}\left(1+\frac{x'^2}{3}\right)^{-5/2}\,dx'=\frac{1}{2}+\frac{x\left(2x^2+9\right)}{4\left(x^2+3\right)^{3/2}}$

The square of the inverse of the distribution function is a root of a cubic polynomial, so there is a formula for it:

$$q\mapsto F^{-1}_X(q)=F^{-1}_Y(q)=Q_4(q)=\sqrt 3\operatorname{sgn}\left(q-\tfrac12\right)\sqrt{\frac{\frac{1}{2}\cos\left(\frac{1}{3}\tan^{-1}\left(\frac{1}{2}\left(\frac{1}{\sqrt{1/q-1}}-\sqrt{1/q-1}\right)\right)\right)}{\sqrt{q(1-q)}}-1}$$

As with all quantile functions, one must be careful implementing this formula in code. As stated, it involves subtractions of quantities with potentially very different precisions. And, of course, it diverges as its argument approaches zero from above and one from below. As with all symmetric random variables, note that $Q_4(1-q)=-Q_4(q)$.

Dividing the joint density by the marginal densities and substituting in the marginal inverse distributions yields the copula density.

bivariate $t_4$ copula density: $(u,v)\mapsto c_4(u,v)=\frac{32}{9\pi}\frac{1}{\sqrt{1-\rho^2}}\left(1+\frac{Q_4(u)^2-2\rho Q_4(u)Q_4(v)+Q_4(v)^2}{3(1-\rho^2)}\right)^{-3}\left(1+\frac{Q_4(u)^2}{3}\right)^{5/2}\left(1+\frac{Q_4(v)^2}{3}\right)^{5/2}$