One-variable example
Imagine that the function f (x) gives the height through a mountain range at position x. Since this is a one-dimensional example, we are thinking of some cross section through the mountain, as illustrated by this graph of f (x).
Now imagine that you are crossing the mountain range so that your x-position at time t is given by x = g(t).
First of all, what is your height at time t? It is simply the height of the mountain at the position x = g(t), i.e., f (x) evaluated at g(t), which is f (g(t)). We define a function h(t) to give your height at time t. It is
| h(t) = f (g(t)). |
The chain rule is the rule we use if we want to take the derivative of
a composition of functions. In this example, how fast is your height
changing as you walk along the path given by g(t)? It is simply the
derivative of h with respect to t:
(t).
The chain rule gives the derivative of h in terms of the derivatives
of g and f. You may remember from one-variable calculus that
| (1) |
The chain rule makes it a lot easier to compute derivatives. For example, if g(t) = t2 and f (x) = sin x, then h(t) = sin(t2). We can easily calculate that
| = g'(t) = 2t, | ||
| = f '(x) = cos x, | ||
|
so that
| ||
| = f '(g(t)) = cos(t2). | ||
The general form of the chain rule
Even though f, g, and h are one-variable functions, we could use the notation for the derivative of multivariable functions. Remember that the derivative of a multivariable function is its matrix of partial derivatives. Well, we can view the derivatives of f, g, and h as 1 × 1 matrices,
| Df (x) | = |
|
| Dg(t) | = |
|
| Dh(t) | = |
| Dh(t) = Df (g(t))Dg(t). | (2) |
| f | : Rn |
(3) |
| g | : Rm |
|
| h | : Rm |
The chain rule in two dimensions
Let's redefine our mountain range function to be a more realistic, two-variable function. Define f (x, y) to be the height a mountain range at the point (x, y), such as in the graph below. As before, you cross through the mountain range. This time, to specify how you cross the mountain range, you need to specify a path, such as illustrated by the thick blue curve through the mountains below.
Of course, when you walk through the mountains, you start at one end of the path and, as time progresses, you walk along the path to the other end. You could describe your position during this walk by giving your x-position and your y-position as functions of time, say x = g1(t) and y = g2(t). We could write your position more succinctly if we let x = (x, y) and g(t) = (g1(t), g2(t)). Then, your position at time t would be x = g(t). If you left a trail of (blue) bread crumbs as you walked along the path, let's say that the trail would look like the below graph (which, if plotted on top of the mountain, would look like the above blue curve on the mountain).
As before, we are interested in your height as a function of time (After all, you want to know how much you'll have to climb.) We know that the height of the mountain at position x is f (x) = f (x, y). We define a function h(t) to give your height at time t as the composition of f and g: h(t) = f (g(t)), which we can also write as h(t) = (fog)(t).
How fast is your height changing as you walk along the path given by
the function
g(t)? It is, of course, the derivative of h(t):
. Since h is a composition of functions,
we can use the chain rule to compute its derivative.
Just as in the one-variable case (equation (2)), the chain rule is
| Dh(t) = Df (g(t))Dg(t). | (4) |
We can also write this in terms of components. The matrices of partial derivatives are
| Dh(t) | = |
|
| Df (x) | = |
|
| Dg(t) | = ![]() |
| (5) |
The nice thing about equation (4) is that it applies when you take the derivative of any composition of functions. So if you remember equation (4) (and how to multiply matrices), then you'll be all set. You can then even compute the derivative of h = fog for the functions written in equation (3), but we won't deal with the general case in this reading.
Click here for some chain rule examples.