The directional derivative
Let's revisit our two-variable mountain range function f, where f (x, y) is the height a mountain range at point x = (x, y). Imagine you are standing at some point x = a. The slope of the ground in front of you will depend on the direction you are facing. It might slope steeply up in one direction, be relatively flat in another direction, and slope steeply down in yet another direction.
You already learned about the slope in the positive x direction and the slope in the positive y direction. These slopes are the partial derivatives with respect to x and with respect to y. We can compute the slope in any direction with something called the directional derivative.
To take a directional derivative, we first need to specify the direction. We do that by specifying a vector u = (u1, u2) that points in the direction in which we want to compute the slope. (Since we don't care about the length of u, we can assume it is a unit vector with length ||u|| = 1.) We write the directional derivative of f in the direction u at the point a as Duf (a). The directional derivative Duf (a) is simply the slope of f (x, y) when standing at the point a and facing the direction given by u. If x and y were given in meters, then Duf(a) would be the change in height per meter as you moved in the direction given by u when you are at the point a.
Note that
Duf (a) is a number, not a matrix. In fact,
the directional derivative is the same as a partial derivative if
u points in the positive x or positive y direction. For
example, if
u = (1, 0), then
Duf (a) =
(a). Similarly if
u = (0, 1), then
Duf (a) =
(a).
In the following CVT, the two-variable mountain range described by f (x, y) is shown as a level curve plot. The point a is shown in dark red (which you can move by dragging). Although I didn't label the level curves with their height, the height at point a (i.e., f (a)) is shown on the bottom (cyan) slider labeled by "f". (You can recognize the two steep mountain peaks by the closely spaced circular level curves.)
The direction vector
u is shown by the light green vector
emanating from
a. You can change the direction of
u by changing
to anything between 0 and 2
using
the top slider. The value of
Duf (a) is shown by the
middle (light green) slider labeled by "Duf". Notice that for this
first CVT,
is set so that when
= 0, then
u
points in the positive x direction (
u = (1, 0)) so that
Duf (a) =
(a).
Similarly, when
=
/2, then
u points in the positive
y direction (
u = (0, 1)) so that
Duf (a) =
(a).
If you make
u point in a direction parallel to the level curve,
what happens to
Duf (a)? (Since the height is constant
along a level curve, you should be able to infer what the slope in
that direction should be.) What happens to
Duf (a) when
you turn
u to point in the opposite direction (i.e., add or
subtract
from
)?
For fun, I've duplicated this CVT using a plot of z = f (x, y), below. In this view, the steepness may be easier to see. However, this view is a little misleading for two reasons. First, the dark red dot now floats on the surface of the mountain. Hence, the dark red dot is no longer a, which for this example is really a point in two dimensions. Second, the light green vector is now a three-dimensional vector that points up or down the mountain. The light green vector is no longer exactly the direction vector u, which for this example is really a two-dimensional vector. Nonetheless, this second view further illustrates the concepts of the directional derivative. You can manipulate it in the same way as above.
The gradient
In most cases, there is always one direction
u where the
directional derivative
Duf (a) is the largest. This is
the "uphill" direction. (In some cases, such as when you are at the
top of a mountain peak or at the lowest point in a valley, this might
not be true.) Let's call this direction of maximal slope
m.
Both the direction
m and the maximal directional derivative
Dmf (a) are captured by something called the
gradient of f and denoted by
f (a). The
gradient is a vector that points in the direction of
m and
whose magnitude is
Dmf (a). In math, we can write this
as
= m and
||
f (a)|| = Dmf (a).
To illustrate, the below CVT shows the gradient as a dark blue vector
emanating from the point
a. (Actually, I made the vector ten
times longer than it should be so you can see it better.) The actual
length (or magnitude) of the gradient
||
f (a)|| is shown
by the dark blue line on the middle (light green) slider.
The CVT also includes the vector
u in light green, as above,
and shows the value of the directional derivative
Duf (a) by the light green line on the middle (cyan)
slider. As before, you can change
, you can change the vector
u. This time, however, I've changed the definition of
.
Above,
was the angle from the positive x direction. Now,
is the angle between the direction specified by
u and
the direction specified by the gradient. (So, for example, when
= 0,
u points in the same direction as the gradient.)
Notice how the dark blue gradient vector always points up the mountains (in fact, the gradient is always perpendicular to the level curves). When the level curves are close together, the gradient is large. What happens to the gradient at the tops of the mountains?
Note that when
= 0 (or
= 2
), the directional
derivative
Duf (a) (shown by the light green line on the
middle slider) and the magnitude of the gradient
||
f (a)|| (shown by the dark blue line on the middle slider) are
identical, i.e.,
Duf (a) = ||
f (a)||. When
=
, then
u points in the opposite direction of the
gradient, and
Duf (a) = - ||
f (a)||. For
what values of
is
Duf (a) = 0?
By moving
a (the dark red point) around and changing
,
I hope you can convince yourself that, for a fixed
a, the
maximal value of
Duf (a) occurs when
u and
f (a) point in the same direction (i.e., when
= 0
or
= 2
), and the minimum value occurs when
u and
f (a) point in opposite directions (i.e., when
=
). Hence
Duf (a) always lies
between
- ||
f (a)|| and
||
f (a)||. It turns
out that the relationship between the gradient and the directional
derivative can be summarized by the equation
| Duf (a) | = |
|
| = || |
||
| = || |
I also duplicated this CVT using a plot of z = f (x, y), below.
Although its steepness may be easier to see, recall from the above
discussion that the dark red point is no longer really
a and
the light green vector is no longer really
u. Similarly, since
the dark blue vector points up the mountain, it is no longer really
the gradient
f (a), which, for a function f (x, y) of two
variables, is a two-dimensional vector. (I also couldn't get the
length of the dark blue vector to accurately represent the length of
the gradient; the true length of the gradient is given by the dark
blue line on the bottom (light green) slider.) Despite its
shortcomings, I hope this last CVT can help you see how the gradient
always points in the direction where the mountain rises most steeply.
Examples are available here.