|
Potential-pH Diagrams
|

|
|
|
Intelligent Tools
|
|

|
|
|
|

|
|
|

|
|
|
Corrosion Calculator
|
|

|
|
|
|
Corrosion Economics Estimator
|
|

|
|
|
|
|
|
TUTORIAL ON ARTIFICIAL NEURAL NETWORKS
David C. Silverman
|
|
Table of Contents
Radial Basis Function Artificial Neural Network
While the back-propagation neural network is the one most commonly implemented for
classification problems using supervised training, other technologies exist.
Radial Basis Functions are one such technology. Neural networks containing
radial basis functions can be used in many of the same situations in which
back-propagation networks are used. This section briefly describes radial
basis functions and then provides a comparison of the results for the simple
example of rounding the square of two numbers used in the
back-propagation neural network example, the
probabilistic neural network example, the
general regression neural network example, and the
modular neural network example
Radial
basis functions were first reported as another type of artificial neural
network in the late 1980’s. Several articles provide the early background for this
technology (J. Moody and C. J. Darken, "Fast Learning in Networks of
Locally Tuned Processing Units", Neural Computation, 1, p281-294, 1989
and J. A. Leonard and M. A. Kramer, "Radial Basis Functions for Classifying
Process Faults", IEEE Control Systems, April, 1991). An example of an
artificial neural network containing radial basis functions is shown
in this figure
.
This figure is an example of two inputs and one output. It corresponds to
the example discussed below. An additional summation node would be present for
each additional output. Each hidden computing element would have a different
radial basis function.
Radial basis functions tend to be embedded in a two layer neural network where
each hidden computing unit has a radial activated function. Radial basis functions use
radially symmetric computing elements and radially bounded transfer functions
in the hidden layer. The output units implement a weighted sum of outputs from the
hidden unit to form their outputs. In pattern classification as required for
the simple example, the inputs represent feature entries while each output
corresponds to a class. The hidden units correspond to subclasses.
A number of algorithms exist to train this type of network. One example is an
algorithm with two main steps.
The first step is a clustering step in which the incoming
weights from the input layer become centers of clusters of input vectors.
One algorithm often used for centering is the k-means clustering algorithm.
The second step finishes the training by setting the radii of the Gaussian
functions centered at the cluster centers. These radii encompass the information
in each cluster that is most likely related.
The k-means clustering algorithm determines the Euclidean distance between
the input and each clustering center (the input weights) which have been
initialized randomly. In the first step, the algorithm determines the closest center. A
significant number of training sets are required for this algorithm to
train successfully. For example, five hundred were used in the example below.
The radial basis function most often used in neural networks is Gaussian:
(5)
where x is the vector of input values, μ is the mean location, σ is the standard deviation
(cluster width),
and ||.....|| is the
Euclidean distance
between the input vector and the center
of the cluster. The dimensions are determined by
the size of the input vector. Each input element is part of that vector.
Equation (5) is the activation function for the hidden node. The algorithm
determines the centers, μ, so that the sum of the squares of distances between
each training vector, x, and its closest center is a local minimum. An equation (5)
is written for each center which appears as a node in the hidden layer. In the second
step the algorithm determines the value of σ. One
point is that this entire first step has been accomplished in the absence of
output information. That is, the hidden layer functionality or activation function
has been determined completely by self-organization of the input values.
Upon completion of the self-organizing step, the output layer can be trained,
e.g. its weights determined, using the delta rule learning algorithm as in
the back-propagation network. In this case, the square of the difference
between the desired output and the calculated output are minimized. This
type of mapping is linear because the summation is over the product of the
weights times the outputs of the hidden layer containing the radial basis
functions. These weights when multiplied by the activation function
for each hidden node determine which node influences the output (in which class the input
vector resides) to determine the appropriate output value to be predicted.
An additional hidden layer could be inserted between the existing
hidden layer and the output layer if needed.
The following example is used for illustration. It is identical to the
one used for the
back-propagation neural network example, the
probabilistic neural network example, the
general regression neural network example, and the
modular neural network example and is provided to show that a radial basis function neural
network can sometimes be used in place of these others.
- Square two random numbers each between 0 and 1 and add the results together.
- Train a radial basis function neural network to decide how to round
the number. If the sum is greater than or equal to 0.5 round to 1, if
less than 0.5 round to 0.
This problem is a simple classification decision problem in which the
neural network is presented with 2 numbers as input and outputs a value
of 0 or 1 depending on the value of the sum of the squares. To make the
example more realistic, the inputs are the non-squared values of the two
random numbers and the output is 0 or 1. The network has to learn the relationship
between the two numbers and from that the decision on whether the output
value is 0 or 1. The actual values of the outputs are not important, only whether or not
they are greater than, equal to, or less than 0.5. This type of decision represents
a very typical real life
decision in which several independent observables are present and a decision
has to be made from their relationship without knowing anything about their
relationship.
Five hundred training sets were created randomly about evenly divided between those
that round to 1 and those that round to 0. Since the number of appropriate hidden nodes
could not be determined beforehand, networks were designed with
50, 25, 10, 5, and 2 hidden nodes (e.g. 50, 25, 10, 5, and 2 possible centers).
The network with 2 hidden nodes failed to train. The others trained to about
the same error using the sum of the squares of predicted minus actual values.
Since the goal was
to use the simplest network, only that with 5 hidden nodes was examined.
Networks with 3 and 4 hidden nodes were not constructed.
A different set of 100 randomly generated input-output combinations were used
to test the trained networks. The calculated outputs were between about
-.1 and +1.1. The strategy used to assess error was to assume that if the
value is less than 0.5, the prediction would have been zero and if the value
is greater than or equal to 0.5, the prediction would have been 1. These
values were compared to the expected
outputs to assess error. No attempt was made to compare actual values because
that information did
not enter into the decision. This figure
.
shows correct and incorrect
responses for the radial basis function neural network. Only one point
was in error and that point was at the boundary. Classification was very good.
The network based on radial basis functions generalized to about
the same accuracy as the
back-propagation network with three hidden nodes. It also trained
to about the accuracy as the probabilistic neural network,
the general regression neural network,
and the modular neural network.
This result is in agreement with the
comment earlier that classification problems that can be generalized by
back-propagation neural networks can sometimes be generalized by neural
networks using radial basis functions especially if enough data points
are available.
The error is at the boundary. This observation is not
limited to this example. Dividing information among classes becomes more
difficult the closer one is to the boundary between the classes.
|
Previous Page: Example of Back-propagation Artificial Neural Network
Next Page: Probabilistic Artificial Neural Network
Return to Table of Contents
|
David C. Silverman, Ph.D. - Primary Consultant
E-Mail: dcsilverman@argentumsolutions.com
Phone: 314-576-3586
Fax: 314-754-9825
Address: The Argentum House
14314 Strawbridge Ct.
Chesterfield, MO 63017
|
|