|
Potential-pH Diagrams
|

|
|
|
Intelligent Tools
|
|

|
|
|
|

|
|
|

|
|
|
Corrosion Calculator
|
|

|
|
|
|
Corrosion Economics Estimator
|
|

|
|
|
|
|
|
TUTORIAL ON ARTIFICIAL NEURAL NETWORKS
David C. Silverman
|
|
Table of Contents
Probabilistic Artificial Neural Network
Probabilistic neural networks are also known as belief networks, Bayesian networks,
and knowledge maps. They have the following characteristics:
- network nodes are comprised of random variables
- directed links connect pairs of nodes. The arrows denote directionality of
influence.
- each node has a conditional probability table quantifying the effects
of all nodes feeding into it
Bayesian learning follows the idea of using hypotheses between the input data
and output predictions. In simple terms, the learning accounts for the relative
likeliness of an outcome. For example if an input or feature vector has equal
probabilities to be in either of two categories and one of them rarely occurs
then selecting the category that occurs more often is more likely to be correct.
This prior knowledge is used to improve the prediction. This type of network can
require a large number of training sets.
The implementation in general terms uses the rounding problem in
this tutorial to guide the explanation. That problem is as follows:
- Square two random numbers each between 0 and 1 and add the results together.
- Train a probabilistic neural network to decide how to round the number. If the
sum is greater than or equal to 0.5 round to 1, if less than 0.5 round to 0.
For this implementation, the number of outputs has been changed from 1 to 2
to represent each class separately. One output is 1 if the result of the sum
of squares is greater than or equal to 0.5 and 0 otherwise. The other output
is 1 if the result of the sum of squares is less than 0.5 and 0 otherwise. In
this case, further processing would be required to properly represent the meaning
to the outside world. This dual output approach could have been used with the
other networks as well.
In simple terms, the Bayes decision rule is a statistical
rule in which the strategy chosen from among several available strategies is the
one for which the expected value of the outcome is the greatest. In the case of the
probabilistic neural network
each of these classes has a probability density function and a probability that
each member of the inputs will be in that class. The Bayes decision rule compares
the products of the probability density function of the class (the probability (chance)
of each input feature being a member of that class) with relative frequency that each input
vector is a member of that class. In the example with 500 training examples, the
comparison is made for each of the 500 input points. The decision rule is
to evaluate the probability density function at each input, weight them,
and compare them.
The probability density function must be estimated. The Parzen estimator is
often used because of its simplicity. This estimator places a Gaussian
distribution at each of the data points, in this case each of the 500 data points.
There are two classes in this example, 0 and 1 (class 1 and class 2). The Parzen estimator for the
probability density function for x to lie in class 1 (pd(x|class=1)) is
(6)
where Ntrain-1 is the number of training sets in class 1, N is the dimension
(total number) of input nodes, j is a member of class 1, and σ is the variance (~width of the distribution).
This estimator models the data distribution as a sum of distributions centered on the data points that
lie within the class. In the case of two classes, the ratio of the probabilities
for a test value of xtest to lie in either class is
(7)
where the symbols are defined as in equation (6). Taking the limit as σ approaches
0, x test is classified in class 1 if x test has a point in
class 1 data which is closer than the closest point in class 2 data and vice versa.
This approach is very similar to a nearest neighbor method of which
radial basis functions
is one example.
In terms of the above example, the network has two outputs, one for the
class rounding to 0 and the other for the class rounding to 1. Each training
set is considered to be a single input. The number of processing elements
(containing the probability density functions) in the pattern layer is equal
to or greater than the number of training examples. This figure
.
shows the netwok for the rounding example. Training is fairly rapid.
Notice that equation 7 is represented by two circles for each hidden layer
computing element, one attached to the output corresponding to class 1 and
the other attached to the output corresponding to class 2. These two nodes
are considered as combined in the algorithm. The differentiation is meant
to show choice.
Five hundred training sets were created about evenly divided between those
that round to 1 and those that round to 0. The network shown in this figure
.
was trained with 500 computing elements. That is, the hidden layer had 500
computing elements. A different set of 100 randomly
generated input-output combinations were used to test the trained networks.
The calculated outputs were between about -.1 and +1.1. The strategy used
to assess error was to assume that if the value is less than 0.5, the
prediction would have been zero and if the value is greater than or equal to 0.5,
the prediction would have been 1. These values were
compared to the actual outputs to assess error. No attempt was made to compare actual values
because that information did not enter into the decision. This figure
shows correct and incorrect responses for the probabilistic neural network .
The network had an error rate for the test set comparable to the back-propagation
network with one or two hidden nodes. Note that the training was not optimized so
a relative judgement cannot be made.
- The probabilistic neural network generalized to about the same accuracy
as the back-propagation networks
and radial basis function neural
network , general regression neural
network , and modular neural
network . This result is in agreement with the comment earlier that
classification problems that can be generalized by back-propagation neural
networks can sometimes be generalized by neural networks based on probabilistic
techniques.
- The errors are congregated at the boundary. This observation is not
limited to this example. Dividing information among classes becomes more
difficult the closer one is to the boundary between the classes.
|
Previous Page: Radial Basis Function Artificial Neural Network
Next Page: General Regression Artificial Neural Network
Return to Table of Contents
|
David C. Silverman, Ph.D. - Primary Consultant
E-Mail: dcsilverman@argentumsolutions.com
Phone: 314-576-3586
Fax: 314-754-9825
Address: The Argentum House
14314 Strawbridge Ct.
Chesterfield, MO 63017
|
|