Remember one-variable calculus Taylor's theorem. Given a one variable function f (x), you can fit it with a polynomial around x = a.
For example, the best linear approximation for f (x) is
| f (x) |
We can add additional, higher-order terms, to approximate f (x) better near a. The best quadratic approximation is
| f (x) |
| f (x) |
We want to generalize the Taylor polynomial to (scalar-valued) functions of multiple variables:
| f (x) = f (x1, x2,..., xn). |
We already know the best linear approximation to f. It involves the derivative,
| f (x) |
(1) |
What about the second-order Taylor polynomial? To find a quadratic approximation, we need to add quadratic terms to our linear approximation. For a function of one-variable f (x), the quadratic term was
| (2) |
Since f (x) is scalar, the first derivative is Df (x), which we can view as a vector-valued function of x. For the second derivative of f (x), we can take the matrix of partial derivatives of the function Df (x) (I suppose we could write this something like DDf (x) for the moment). This second derivative matrix is called the Hessian matrix of f, and is denoted Hf (x),
| Hf (x) = DDf (x). |
When f is a function of multiple variables, the second derivative term in the Taylor series will use the Hessian Hf (a). For the single-variable case, we could rewrite expression (2) as
The important point to remember is that we can add the above expression to our first-order Taylor polynomial (1) to obtain the second-order Taylor polynomial for functions of multiple variables:
| f (x) |
You can read some examples here.