Face Detection and Neural Networks

Todd Wittman
Math 8600: Image Analysis
Prof. Jianhong Shen
Fall 2001 / Spring 2002

These are Matlab m-files that describe a neural network which will (hopefully) identify if an image contains a human face. The output value is a number which represents the probability that the image is a face: 1 for face, 0 for not a face. You need to download all the required m-files for the programs to run.


Documentation

In the fall semester of 2001, I tried to develop an artificial neural network that could detect fleshtones. I tried working in the RGB and YES color spaces, getting slightly better results in the YES space.

In the following spring semester of 2002, I tried to refine the network by using more training data. This network seemed to do a better job detecting flesh tones. I also tried, unsuccessfully, to link the network to a segmentation program. Unfortunately, it is very difficult to segment natural images.


RGB Histogram Approach

This neural network takes the three R,G,B histograms (appended as one vector) as input. Select any color jpeg image. Matlab will not read gif files. If the image contains a face, it should be a full frontal view of the head (mugshot). To run the network, type in Matlab the command:

y = rgb_forward('your_image.jpg')

The returned value y is the probability that the image represents a face. You will need all the files below to run the program. The result should include the original image, the color histogram, and the y value.

rgb_forward.m: Forward neural network for RGB approach.
Required files:
forward.m: Obtains output from general neural network.
image2rgbhist.m: Converts image to vector of R,G,B histograms.
resize_matrix.m: Interpolates pixel values from image.
sigmoid.m: Threshold function for firing of neurons.
rgb_weights.mat: Contains the weights calculated from the training set.

YES Histogram Approach

This neural network uses the histograms under the YES scheme rather than the conventional RGB scheme. Performance seems to have increased slightly under this scheme. Select any color jpeg image. To find the output, type in Matlab:

y=yes_forward('your_image.jpg')

The returned y value should be > 0.5 for a human face, < 0.5 if not a face. The plot should look include the original image, the color histogram, and the y value.

yes_forward.m: Forward neural network for YES approach.
Required files:
forward.m: Obtains output from general neural network.
image2yeshist.m: Converts image to vector of Y,E,S histograms.
resize_matrix.m: Interpolates pixel values from image.
sigmoid.m: Threshold function for firing of neurons.
yes_weights.mat: Contains the weights calculated from the training set.

Training

The following m-files were used to train the neural network. The Levenberg-Marquad code and the related files eone.m, etot.m were written by Dr. Fadil Santosa. The trainall routine takes about 5 minutes to run, but it only runs 10 iterations at a time. To start a new set of weights, set the toggle to 0 initially. Otherwise, the algorithm will start with the weights contained in the .mat file.
Levenberg-Marquad Code
levmar.m: The driver for the training process.
etot.m: Computes gradient for overall system of images.
eone.m: Computes gradient for one image.
sigmoid.m: Threshold function for firing of neurons.
RGB Histogram Approach
rgb_trainall.m: Computes weights for 20 images in training set.
rgb_weights.mat: Contains weights calculated after 100 iterations.
image2rgbhist.m: Converts image to vector of R,G,B histograms.
resize_matrix.m: Interpolates pixel values from image.
YES Histogram Approach
yes_trainall.m: Computes weights for 20 images in training set.
yes_weights.mat: Contains weights calculated after 100 iterations.
image2yeshist.m: Converts image to vector of Y,E,S histograms.
resize_matrix.m: Interpolates pixel values from image.

Training Images

In Fall 2001, I uesed a small 20 image training set to generate the weights in the .mat files. All faces were fed in with output y=1, while non-faces had output y=0. The 10 faces were chosen to represent different age groups, skin tones, and genders. The non-faces were random objects (toasters, landscapes, cats, etc.).

In Spring 2002, I used a larger training set of 100 faces. The faces were chosen by race and gender to match proportionately to the 2000 US Census statistics. A picture of all the faces used is given below. The non-faces were again some random scenes, plus some randomly generated "gibberish" color matrices.


Results

In general, the YES approach seems to outperform the RGB approach on the test images. However, the YES approach fails to classify 3 of the test images which is rather disturbing. Specificaly, YES mistakenly classifies the images carter.jpg (i=5), hand.jpg (i=15), and glove.jpg (i=16). This can be seen in the graphs below, which show the outputs on the training sets for both methods. The first 10 images (i=1:10) are faces, so the output y should be > 0.5. The last 10 images (i=11:20) are non-faces, so the output should be < 0.5.
RGB Training
YES Training
The variations between the two methods are due to the differences between the RGB and YES histograms. The YES histograms tend to show more variation, while the RGB histograms tend to cluster and look very similar. The picture below shows the RGB images and histograms compared to the YES versions below.

wittman@math.umn.edu