|
Potential-pH Diagrams
|

|
|
|
Intelligent Tools
|
|

|
|
|
|

|
|
|

|
|
|
Corrosion Calculator
|
|

|
|
|
|
Corrosion Economics Estimator
|
|

|
|
|
|
|
|
TUTORIAL ON ARTIFICIAL NEURAL NETWORKS
David C. Silverman
|
|
Table of Contents
Example of Back-propagation Artificial
Neural Network
The following simple example of the back-propagation neural network is provided
to illustrate the points outlined in the sections on
the computing element
, back-propagation network structure
, and back-propagation network
training of this tutorial.
This same example is used elsewhere in this tutorial when discussing other types of
neural or belief networks so that differences and similarities among networks
can be seen.
- Square two random numbers each between 0 and 1 and add the results together.
- Train a back-propagation neural network to decide how to round the number.
If the sum is greater than or equal to 0.5 round to 1, if less than 0.5 round
to 0.
This problem is a simple classification decision problem in which the neural
network is presented with 2 numbers as input and then predicts a value of 0 or 1 depending
on the value of the sum of the squares. To make the example more realistic,
the inputs are the non-squared values of the two random numbers and the output
is 0 or 1. The network then has to learn the relationship between the two
numbers and from that make the decision of whether the output value is 0 or 1.
This type of decision represents a very typical real life decision in which
several independent observables are present and a decision has to be made from
their relationship without knowing anything about their relationship.
TRAINING
Five hundred training sets were created by using two random number generators.
The values were about evenly divided between those when squared
rounded to 1 and those when squared rounded to 0. Three networks that had 1, 2, or 3
hidden nodes were created and trained. All had one hidden layer.
The network with two
hidden nodes is shown in this figure .
as an example.
Note the additional node on the left is a constant.
This node is like a threshold or offset and has a constant value of 1. Its weight
is determined during training. It is comparable to the constant in a polynomial
fit whose value is determined during non-linear regression.
Each of the three networks was trained for between 500 and 1000 epochs where
an epoch is one complete pass through the complete set of training data. The sum
of the squares of differences between calculated and expected
(0 or 1) values was used to track the error. There was virtually no change in error
between training for 500 epochs and 1000 epochs. Since the number of data
sets was no less than about 50 times the number of connections, the expectation
would be that each of the networks would generalize on the information, not memorize it.
RESULTS
A different set of 100 randomly generated input-output combinations were used
to test the trained networks. They were about evenly distributed between the two
classes. The calculated output values were between about -.1
and +1.1. The strategy used to assess error was to assume that if the value
predicted by the network
was less than 0.5, the interpreted prediction would have been zero and if the value was
greater than or equal to 0.5, the interpreted prediction would have been 1.
These values were compared to the expected rounding outputs to assess error.
No attempt
was made to compare actual values because non-linear regression was not the
goal of the exercise.
This figure .
shows correct and incorrect responses for the neural network
with one hidden node. This figure .
shows the correct and incorrect responses for the neural network with two hidden
nodes. This figure .
shows the correct
and incorrect responses for the neural network with three hidden nodes.
CONCLUSIONS
- The network with three hidden nodes seemed to train to better generalization
than those with fewer hidden nodes. Though not tried, adding one or two additional
hidden nodes might have improved training. Generalization would not have been
compromised.
- Training was not optimized to ensure that the absolute minimum in
error was reached with any of the three networks. The important point is that
the back-propagation neural network was able to categorize the information
(learn the data structure) to make reasonable predictions
- The errors are congregated at the boundary. This observation is not limited
to this example. Dividing information among classes becomes more difficult
the closer one is to the boundary between the classes.
- This problem was "learning" rounding to 0 or 1. If a new test data set is
introduced that rounds to 1.5, chances of this network making an accurate prediction
is slim. Additional training would have to be done with the new information for
the neural network to function properly with this new information. That ability
to "learn" new information is a strength of this technique.
- As shown elsewhere in this tutorial, the back-propagation neural network trained
to an accuracy similar to that found with the radial basis function neural network,
the probabilistic neural network,the general regression neural network,
and the modular neural network. These latter
networks have sometimes found to be equal to or superior to the back-propagation neural network.
|
Previous Page: Training the Back-Propagation Neural Network
Next Page: Radial Basis Function Artificial Neural Network
Return to Table of Contents
|
David C. Silverman, Ph.D. - Primary Consultant
E-Mail: dcsilverman@argentumsolutions.com
Phone: 314-576-3586
Fax: 314-754-9825
Address: The Argentum House
14314 Strawbridge Ct.
Chesterfield, MO 63017
|
|