The directional derivative and the gradient

The directional derivative

Let’s revisit our two-variable mountain range function f, where f(x,y) is the height a mountain range at point x = (x,y). Imagine you are standing at some point x = a. The slope of the ground in front of you will depend on the direction you are facing. It might slope steeply up in one direction, be relatively flat in another direction, and slope steeply down in yet another direction.

You already learned about the slope in the positive x direction and the slope in the positive y direction. These slopes are the partial derivatives with respect to x and with respect to y. We can compute the slope in any direction with something called the directional derivative.

To take a directional derivative, we first need to specify the direction. We do that by specifying a vector u = (u1,u2) that points in the direction in which we want to compute the slope. (Since we don’t care about the length of u, we can assume it is a unit vector with length ||u|| = 1.) We write the directional derivative of f in the direction u at the point a as Duf(a). The directional derivative Duf(a) is simply the slope of f(x,y) when standing at the point a and facing the direction given by u. If x and y were given in meters, then Duf(a) would be the change in height per meter as you moved in the direction given by u when you are at the point a.

Note that Duf(a) is a number, not a matrix. In fact, the directional derivative is the same as a partial derivative if u points in the positive x or positive y direction. For example, if u = (1, 0), then Duf(a) = f x(a). Similarly if u = (0, 1), then Duf(a) = f y(a).

In the following CVT, the two-variable mountain range described by f(x,y) is shown as a level curve plot. The point a is shown in dark red (which you can move by dragging). Although I didn’t label the level curves with their height, the height at point a (i.e., f(a)) is shown on the bottom (cyan) slider labeled by “f”. (You can recognize the two steep mountain peaks by the closely spaced circular level curves.)

The direction vector u is shown by the light green vector emanating from a. You can change the direction of u by changing θ to anything between 0 and 2π using the top slider. The value of Duf(a) is shown by the middle (light green) slider labeled by “Duf”. Notice that for this first CVT, θ is set so that when θ = 0, then u points in the positive x direction (u = (1, 0)) so that Duf(a) = f x(a). Similarly, when θ = π2, then u points in the positive y direction (u = (0, 1)) so that Duf(a) = f y(a).

If you make u point in a direction parallel to the level curve, what happens to Duf(a)? (Since the height is constant along a level curve, you should be able to infer what the slope in that direction should be.) What happens to Duf(a) when you turn u to point in the opposite direction (i.e., add or subtract π from θ)?

For fun, I’ve duplicated this CVT using a plot of z = f(x,y), below. In this view, the steepness may be easier to see. However, this view is a little misleading for two reasons. First, the dark red dot now floats on the surface of the mountain. Hence, the dark red dot is no longer a, which for this example is really a point in two dimensions. Second, the light green vector is now a three-dimensional vector that points up or down the mountain. The light green vector is no longer exactly the direction vector u, which for this example is really a two-dimensional vector. Nonetheless, this second view further illustrates the concepts of the directional derivative. You can manipulate it in the same way as above.

The gradient

In most cases, there is always one direction u where the directional derivative Duf(a) is the largest. This is the “uphill” direction. (In some cases, such as when you are at the top of a mountain peak or at the lowest point in a valley, this might not be true.) Let’s call this direction of maximal slope m. Both the direction m and the maximal directional derivative Dmf(a) are captured by something called the gradient of f and denoted by f(a). The gradient is a vector that points in the direction of m and whose magnitude is Dmf(a). In math, we can write this as f(a) ||f(a)|| = m and ||f(a)|| = Dmf(a).

To illustrate, the below CVT shows the gradient as a dark blue vector emanating from the point a. (Actually, I made the vector ten times longer than it should be so you can see it better.) The actual length (or magnitude) of the gradient ||f(a)|| is shown by the dark blue line on the middle (light green) slider.

The CVT also includes the vector u in light green, as above, and shows the value of the directional derivative Duf(a) by the light green line on the middle (cyan) slider. As before, you can change θ, you can change the vector u. This time, however, I’ve changed the definition of θ. Above, θ was the angle from the positive x direction. Now, θ is the angle between the direction specified by u and the direction specified by the gradient. (So, for example, when θ = 0, u points in the same direction as the gradient.)

Notice how the dark blue gradient vector always points up the mountains (in fact, the gradient is always perpendicular to the level curves). When the level curves are close together, the gradient is large. What happens to the gradient at the tops of the mountains?

Note that when θ = 0 (or θ = 2π), the directional derivative Duf(a) (shown by the light green line on the middle slider) and the magnitude of the gradient ||f(a)|| (shown by the dark blue line on the middle slider) are identical, i.e., Duf(a) = ||f(a)||. When θ = π, then u points in the opposite direction of the gradient, and Duf(a) = ||f(a)||. For what values of θ is Duf(a) = 0?

By moving a (the dark red point) around and changing θ, I hope you can convince yourself that, for a fixed a, the maximal value of Duf(a) occurs when u and f(a) point in the same direction (i.e., when θ = 0 or θ = 2π), and the minimum value occurs when u and f(a) point in opposite directions (i.e., when θ = π). Hence Duf(a) always lies between ||f(a)|| and ||f(a)||. It turns out that the relationship between the gradient and the directional derivative can be summarized by the equation

Duf(a) = f(a) u = ||f(a)||||u|| cos θ = ||f(a)|| cos θ

where θ is the angle between u and the gradient. (Recall that u is a unit vector, meaning that ||u|| = 1.)

I also duplicated this CVT using a plot of z = f(x,y), below. Although its steepness may be easier to see, recall from the above discussion that the dark red point is no longer really a and the light green vector is no longer really u. Similarly, since the dark blue vector points up the mountain, it is no longer really the gradient f(a), which, for a function f(x,y) of two variables, is a two-dimensional vector. (I also couldn’t get the length of the dark blue vector to accurately represent the length of the gradient; the true length of the gradient is given by the dark blue line on the bottom (light green) slider.) Despite its shortcomings, I hope this last CVT can help you see how the gradient always points in the direction where the mountain rises most steeply.

Examples are available here.