Argentum Solutions, Inc.

    Sterling guidance on corrosion and materials degradation


 

Potential-pH Diagrams
THERMEXPERT - Potential-pH diagram generator

Intelligent Tools

POLEXPERT - Polarization Scan Artificial Neural Network Expert System

SEQEXPERT - Sequential Immersion Test Artificial Neural Network Expert System

CYLEXPERT - Rotating Cylinder Electrode Intelligent Rotation Rate Calculator

Corrosion Calculator

Corrosion Rate Calculator


Corrosion Economics Estimator

FINCALCULATOR - Corrosion Economic Calculator


TUTORIAL ON ARTIFICIAL NEURAL NETWORKS

David C. Silverman


Table of Contents

Overview of Tutorial
Artificial Neural Network Background
The Back-propagation Computing Element
The Back-propagation Artificial Neural Network
Training the Back-Propagation Neural Network
Example of Back-propagation Artificial Neural Network
Radial Basis Function Artificial Neural Network
Probabilistic Artificial Neural Network
General Regression Artificial Neural Network
Modular Artificial Neural Network

Probabilistic Artificial Neural Network

Probabilistic neural networks are also known as belief networks, Bayesian networks, and knowledge maps. They have the following characteristics:
  • network nodes are comprised of random variables
  • directed links connect pairs of nodes. The arrows denote directionality of influence.
  • each node has a conditional probability table quantifying the effects of all nodes feeding into it
Bayesian learning follows the idea of using hypotheses between the input data and output predictions. In simple terms, the learning accounts for the relative likeliness of an outcome. For example if an input or feature vector has equal probabilities to be in either of two categories and one of them rarely occurs then selecting the category that occurs more often is more likely to be correct. This prior knowledge is used to improve the prediction. This type of network can require a large number of training sets.

The implementation in general terms uses the rounding problem in this tutorial to guide the explanation. That problem is as follows:
  1. Square two random numbers each between 0 and 1 and add the results together.
  2. Train a probabilistic neural network to decide how to round the number. If the sum is greater than or equal to 0.5 round to 1, if less than 0.5 round to 0.
For this implementation, the number of outputs has been changed from 1 to 2 to represent each class separately. One output is 1 if the result of the sum of squares is greater than or equal to 0.5 and 0 otherwise. The other output is 1 if the result of the sum of squares is less than 0.5 and 0 otherwise. In this case, further processing would be required to properly represent the meaning to the outside world. This dual output approach could have been used with the other networks as well.

In simple terms, the Bayes decision rule is a statistical rule in which the strategy chosen from among several available strategies is the one for which the expected value of the outcome is the greatest. In the case of the probabilistic neural network each of these classes has a probability density function and a probability that each member of the inputs will be in that class. The Bayes decision rule compares the products of the probability density function of the class (the probability (chance) of each input feature being a member of that class) with relative frequency that each input vector is a member of that class. In the example with 500 training examples, the comparison is made for each of the 500 input points. The decision rule is to evaluate the probability density function at each input, weight them, and compare them.

The probability density function must be estimated. The Parzen estimator is often used because of its simplicity. This estimator places a Gaussian distribution at each of the data points, in this case each of the 500 data points. There are two classes in this example, 0 and 1 (class 1 and class 2). The Parzen estimator for the probability density function for x to lie in class 1 (pd(x|class=1)) is
                                       (6)
where Ntrain-1 is the number of training sets in class 1, N is the dimension (total number) of input nodes, j is a member of class 1, and σ is the variance (~width of the distribution). This estimator models the data distribution as a sum of distributions centered on the data points that lie within the class. In the case of two classes, the ratio of the probabilities for a test value of xtest to lie in either class is
                                                                          (7)
where the symbols are defined as in equation (6). Taking the limit as σ approaches 0, x test is classified in class 1 if x test has a point in class 1 data which is closer than the closest point in class 2 data and vice versa. This approach is very similar to a nearest neighbor method of which radial basis functions is one example.

In terms of the above example, the network has two outputs, one for the class rounding to 0 and the other for the class rounding to 1. Each training set is considered to be a single input. The number of processing elements (containing the probability density functions) in the pattern layer is equal to or greater than the number of training examples. This figure . shows the netwok for the rounding example. Training is fairly rapid. Notice that equation 7 is represented by two circles for each hidden layer computing element, one attached to the output corresponding to class 1 and the other attached to the output corresponding to class 2. These two nodes are considered as combined in the algorithm. The differentiation is meant to show choice.

Five hundred training sets were created about evenly divided between those that round to 1 and those that round to 0. The network shown in this figure . was trained with 500 computing elements. That is, the hidden layer had 500 computing elements. A different set of 100 randomly generated input-output combinations were used to test the trained networks. The calculated outputs were between about -.1 and +1.1. The strategy used to assess error was to assume that if the value is less than 0.5, the prediction would have been zero and if the value is greater than or equal to 0.5, the prediction would have been 1. These values were compared to the actual outputs to assess error. No attempt was made to compare actual values because that information did not enter into the decision. This figure shows correct and incorrect responses for the probabilistic neural network . The network had an error rate for the test set comparable to the back-propagation network with one or two hidden nodes. Note that the training was not optimized so a relative judgement cannot be made.

  1. The probabilistic neural network generalized to about the same accuracy as the back-propagation networks and radial basis function neural network , general regression neural network , and modular neural network . This result is in agreement with the comment earlier that classification problems that can be generalized by back-propagation neural networks can sometimes be generalized by neural networks based on probabilistic techniques.
  2. The errors are congregated at the boundary. This observation is not limited to this example. Dividing information among classes becomes more difficult the closer one is to the boundary between the classes.



Previous Page: Radial Basis Function Artificial Neural Network

Next Page: General Regression Artificial Neural Network

Return to Table of Contents





David C. Silverman, Ph.D. - Primary Consultant
E-Mail:     dcsilverman@argentumsolutions.com
Phone:     314-576-3586
Fax:         314-754-9825
Address:   The Argentum House
                14314 Strawbridge Ct.
                Chesterfield, MO 63017