Taylor’s Theorem

Remember one-variable calculus Taylor’s theorem. Given a one variable function f(x), you can fit it with a polynomial around x = a.

For example, the best linear approximation for f(x) is

f(x)f(a) + f'(a)(x - a).
This linear approximation fits f(x) (shown in green below) with a line (shown in blue) through x = a that matches the slope of f at a.

PIC

We can add additional, higher-order terms, to approximate f(x) better near a. The best quadratic approximation is

f(x)f(a) + f'(a)(x - a) + 1
--
2f'′(a)(x - a)2
We could add third-order or even higher-order terms:
f(x)f(a) + f'(a)(x - a) + 1-
2f'′(a)(x - a)2 + 1-
6f'′′(a)(x - a)3 + ⋅⋅⋅.
The important point is that this Taylor polynomial approximates f(x) well for x near a.

We want to generalize the Taylor polynomial to (scalar-valued) functions of multiple variables:

f(x) = f(x1,x2,,xn).

We already know the best linear approximation to f. It involves the derivative,

f(x)f(a) + Df(a)(x - a). (1)
where Df(a) is the matrix of partial derivatives. The linear approximation is the first-order Taylor polynomial.

What about the second-order Taylor polynomial? To find a quadratic approximation, we need to add quadratic terms to our linear approximation. For a function of one-variable f(x), the quadratic term was

1-
2f'′(a)(x - a)2. (2)
For a function of multiple variables f(x), what is analogous to the second derivative?

Since f(x) is scalar, the first derivative is Df(x), which we can view as a vector-valued function of x. For the second derivative of f(x), we can take the matrix of partial derivatives of the function Df(x) (I suppose we could write this something like DDf(x) for the moment). This second derivative matrix is called the Hessian matrix of f, and is denoted Hf(x),

Hf(x) = DDf(x).

When f is a function of multiple variables, the second derivative term in the Taylor series will use the Hessian Hf(a). For the single-variable case, we could rewrite expression (2) as

1
--
2(x - a)f'′(a)(x - a).
The analog of this expression for the multivariable case is
1
--
2(x - a)T Hf(a)(x - a).

The important point to remember is that we can add the above expression to our first-order Taylor polynomial (1) to obtain the second-order Taylor polynomial for functions of multiple variables:

f(x)f(a) + Df(a)(x - a) + 1-
2(x - a)T Hf(a)(x - a).
The second-order Taylor polynomial is a better approximation of f(x) near x = a than is the linear approximation (which is the same as the first-order Taylor polynomial). We’ll be able to use it for things such as finding a local minimum or local maximum of the function f(x).

You can read some examples here.