The chain rule
One-variable example
Imagine that the function gives the height through a mountain range at position . Since this is a one-dimensional example, we are thinking of some cross section through the mountain, as illustrated by this graph of .

Now imagine that you are crossing the mountain range so that your -position at time is given by .
First of all, what is your height at time ? It is simply the height of the mountain at the position , i.e., evaluated at , which is . We define a function to give your height at time . It is
The function is an example of a composition of functions, meaning it is the result of using function and then using the function . We often write or .
The chain rule is the rule we use if we want to take the derivative of a composition of functions. In this example, how fast is your height changing as you walk along the path given by ? It is simply the derivative of with respect to : . The chain rule gives the derivative of in terms of the derivatives of and . You may remember from one-variable calculus that
The one-variable chain rule states that the derivative of is the product of the derivative of and the derivative of . The only trick to remember is that the derivative of is evaluated at (not at ). This makes sense since is a function of position and .
The chain rule makes it a lot easier to compute derivatives. For example, if and , then . We can easily calculate that
Using the chain rule of equation (1), we compute that the derivative of is
We don’t have to separately learn a rule for the derivative of ; we just need to know the derivatives of and .
The general form of the chain rule
Even though , , and are one-variable functions, we could use the notation for the derivative of multivariable functions. Remember that the derivative of a multivariable function is its matrix of partial derivatives. Well, we can view the derivatives of , , and as matrices,
Using the notation of matrices of partial derivatives, we can rewrite the one-variable chain rule of equation (1) as
Since matrix multiplication of matrices is the same as scalar multiplication, this new equation is just equation (1) in disguised form. Equation (2) is written exactly as the chain rule for higher dimensions. So if you understand what equation (2) means when , , and are the following multivariable functions,
and , then you don’t need to read on.
The chain rule in two dimensions
Let’s redefine our mountain range function to be a more realistic, two-variable function. Define to be the height a mountain range at the point , such as in the graph below. As before, you cross through the mountain range. This time, to specify how you cross the mountain range, you need to specify a path, such as illustrated by the thick blue curve through the mountains below.
Of course, when you walk through the mountains, you start at one end of the path and, as time progresses, you walk along the path to the other end. You could describe your position during this walk by giving your -position and your -position as functions of time, say and . We could write your position more succinctly if we let and . Then, your position at time would be . If you left a trail of (blue) bread crumbs as you walked along the path, let’s say that the trail would look like the below graph (which, if plotted on top of the mountain, would look like the above blue curve on the mountain).

As before, we are interested in your height as a function of time (After all, you want to know how much you’ll have to climb.) We know that the height of the mountain at position is . We define a function to give your height at time as the composition of and : , which we can also write as .
How fast is your height changing as you walk along the path given by the function ? It is, of course, the derivative of : . Since is a composition of functions, we can use the chain rule to compute its derivative.
Just as in the one-variable case (equation (2)), the chain rule is
Again, one important point to remember is that the matrix of partial derivatives of is evaluated at the point .
We can also write this in terms of components. The matrices of partial derivatives are
By multiplying out equation (4), we find that
Equation (5) shows that the chain rule in our two-variable case is just like the one-variable chain rule (equation (1)) applied twice.
The nice thing about equation (4) is that it applies when you take the derivative of any composition of functions. So if you remember equation (4) (and how to multiply matrices), then you’ll be all set. You can then even compute the derivative of for the functions written in equation (3), but we won’t deal with the general case in this reading.
Click here for some chain rule examples.