The chain rule

One-variable example

Imagine that the function f(x) gives the height through a mountain range at position x. Since this is a one-dimensional example, we are thinking of some cross section through the mountain, as illustrated by this graph of f(x).

PIC

Now imagine that you are crossing the mountain range so that your x-position at time t is given by x = g(t).

First of all, what is your height at time t? It is simply the height of the mountain at the position x = g(t), i.e., f(x) evaluated at g(t), which is f(g(t)). We define a function h(t) to give your height at time t. It is

h(t) = f(g(t)).

The function h(t) is an example of a composition of functions, meaning it is the result of using function g and then using the function f. We often write h = f g or h(t) = (f g)(t).

The chain rule is the rule we use if we want to take the derivative of a composition of functions. In this example, how fast is your height changing as you walk along the path given by g(t)? It is simply the derivative of h with respect to t: dh dt (t). The chain rule gives the derivative of h in terms of the derivatives of g and f. You may remember from one-variable calculus that

dh dt (t) = df dx(g(t))dg dt (t).  (1)

The one-variable chain rule states that the derivative of h is the product of the derivative of f and the derivative of g. The only trick to remember is that the derivative of f is evaluated at g(t) (not at t). This makes sense since f is a function of position x and x = g(t).

The chain rule makes it a lot easier to compute derivatives. For example, if g(t) = t2 and f(x) = sin x, then h(t) = sin(t2). We can easily calculate that

dg dt (t) = g'(t) = 2t, df dx(x) = f'(x) = cos x,  so that df dx(g(t)) = f'(g(t)) = cos(t2).

Using the chain rule of equation (1), we compute that the derivative of h(t) is

dh dt (t) = h'(t) = cos(t2)(2t).

We don’t have to separately learn a rule for the derivative of sin(t2); we just need to know the derivatives of sin x and t2.

The general form of the chain rule

Even though f, g, and h are one-variable functions, we could use the notation for the derivative of multivariable functions. Remember that the derivative of a multivariable function is its matrix of partial derivatives. Well, we can view the derivatives of f, g, and h as 1 × 1 matrices,

Df(x) = df dx(x) Dg(t) = dg dt (t) Dh(t) = dh dt (t)

Using the notation of matrices of partial derivatives, we can rewrite the one-variable chain rule of equation (1) as

Dh(t) = Df(g(t))Dg(t).  (2)

Since matrix multiplication of 1 × 1 matrices is the same as scalar multiplication, this new equation is just equation (1) in disguised form. Equation (2) is written exactly as the chain rule for higher dimensions. So if you understand what equation (2) means when f, g, and h are the following multivariable functions,

f : Rn Rp  (3) g : Rm Rn h : Rm Rp,

and h = f g, then you don’t need to read on.

The chain rule in two dimensions

Let’s redefine our mountain range function to be a more realistic, two-variable function. Define f(x,y) to be the height a mountain range at the point (x,y), such as in the graph below. As before, you cross through the mountain range. This time, to specify how you cross the mountain range, you need to specify a path, such as illustrated by the thick blue curve through the mountains below.

Of course, when you walk through the mountains, you start at one end of the path and, as time progresses, you walk along the path to the other end. You could describe your position during this walk by giving your x-position and your y-position as functions of time, say x = g1(t) and y = g2(t). We could write your position more succinctly if we let x = (x,y) and g(t) = (g1(t),g2(t)). Then, your position at time t would be x = g(t). If you left a trail of (blue) bread crumbs as you walked along the path, let’s say that the trail would look like the below graph (which, if plotted on top of the mountain, would look like the above blue curve on the mountain).

PIC

As before, we are interested in your height as a function of time (After all, you want to know how much you’ll have to climb.) We know that the height of the mountain at position x is f(x) = f(x,y). We define a function h(t) to give your height at time t as the composition of f and g: h(t) = f(g(t)), which we can also write as h(t) = (f g)(t).

How fast is your height changing as you walk along the path given by the function g(t)? It is, of course, the derivative of h(t): dh dt . Since h is a composition of functions, we can use the chain rule to compute its derivative.

Just as in the one-variable case (equation (2)), the chain rule is

Dh(t) = Df(g(t))Dg(t).  (4)

Again, one important point to remember is that the matrix of partial derivatives of f is evaluated at the point x = g(t).

We can also write this in terms of components. The matrices of partial derivatives are

Dh(t) = dh dt (t) Df(x) = f x(x)f y(x) Dg(t) = dg1 dt (t) dg2 dt (t) .

By multiplying out equation (4), we find that

dh dt (t) = f x(g(t))dg1 dt (t) + f y(g(t))dg2 dt (t).  (5)

Equation (5) shows that the chain rule in our two-variable case is just like the one-variable chain rule (equation (1)) applied twice.

The nice thing about equation (4) is that it applies when you take the derivative of any composition of functions. So if you remember equation (4) (and how to multiply matrices), then you’ll be all set. You can then even compute the derivative of h = f g for the functions written in equation (3), but we won’t deal with the general case in this reading.

Click here for some chain rule examples.