Taylor’s Theorem
Remember one-variable calculus Taylor’s theorem. Given a one variable function , you can fit it with a polynomial around .
For example, the best linear approximation for is
This linear approximation fits (shown in green below) with a line (shown in blue) through that matches the slope of at .

We can add additional, higher-order terms, to approximate better near . The best quadratic approximation is
We could add third-order or even higher-order terms:
The important point is that this Taylor polynomial approximates well for near .
We want to generalize the Taylor polynomial to (scalar-valued) functions of multiple variables:
We already know the best linear approximation to . It involves the derivative,
where is the matrix of partial derivatives. The linear approximation is the first-order Taylor polynomial.
What about the second-order Taylor polynomial? To find a quadratic approximation, we need to add quadratic terms to our linear approximation. For a function of one-variable , the quadratic term was
For a function of multiple variables , what is analogous to the second derivative?
Since is scalar, the first derivative is , which we can view as a vector-valued function of . For the second derivative of , we can take the matrix of partial derivatives of the function (I suppose we could write this something like for the moment). This second derivative matrix is called the Hessian matrix of , and is denoted ,
When is a function of multiple variables, the second derivative term in the Taylor series will use the Hessian . For the single-variable case, we could rewrite expression (2) as
The analog of this expression for the multivariable case is
The important point to remember is that we can add the above expression to our first-order Taylor polynomial (1) to obtain the second-order Taylor polynomial for functions of multiple variables:
The second-order Taylor polynomial is a better approximation of near than is the linear approximation (which is the same as the first-order Taylor polynomial). We’ll be able to use it for things such as finding a local minimum or local maximum of the function .
You can read some examples here.