|
Potential-pH Diagrams
|

|
|
|
Intelligent Tools
|
|

|
|
|
|

|
|
|

|
|
|
Corrosion Calculator
|
|

|
|
|
|
Corrosion Economics Estimator
|
|

|
|
|
|
|
|
TUTORIAL ON ARTIFICIAL NEURAL NETWORKS
David C. Silverman
|
|
Table of Contents
Training the Back-Propagation Neural Network
Once the network is constructed, it has to be trained. As mentioned in the discussion of
the back-propagation neural network, the
artificial neural network most often applied in corrosion (and elsewhere) has been
the back-propagation network. In corrosion applications, it has been used to solve
either complex pattern-matching problems or fit relationships among variables for
which explicit functions cannot be written. This type of network is shown in this
figure ..
The training algorithm is summarized below. The actual equations
for the algorithm are available in a number of textbooks, for example "Artificial
Intelligence, A Modern Approach", S. J. Russell and P. Novig, Prentice Hall, 1996
and "Neural Networks-Algorithms, Applications, and Programming Techniques", J. A. Freeman
and D. M. Skapura, Addison-Wesley, 1992.
The network learns ("fits" might be an alternative description) a predefined set of
input-output pairs
known as the training set. The methodology is a two phase cycle consisting of propagation
and adaptation. The algorithm is known as the
generalized delta rule. A number of variations exist.
A set of weights
are arbitrarily chosen for each processing element .
After the input
is applied to the first layer of network units, the output is propagated to the next layer as
an input to that layer until an output from the network is calculated (output layer).
This output pattern is compared to the desired output. An error is created as the sum of
the squares of the differences between the each calculated and desired output. Nothing
magical exists with respect to the error function. Cubic and fourth power differences have
also been used.
This error is sent backward from the output layer to each computing element that contributed
to the output (the next lower hidden layer). But the total error is divided among the
computing elements according to their relative contributions to the original output. Once
completed, the process is repeated for each layer of computing elements. The weights for
each pathway to each computing elements are updated. The network converges toward a state
that should enable all training patterns to be coded properly. Back-propagation is a fairly
robust form of non-linear regression.
As the network trains, the computing elements in the intermediate layer(s) organize
themselves so that different computing elements "learn" to recognize different features
in the input space. At some point, often at an arbitrarily chosen minimum sum of squares,
the training is deemed complete. Another set of input-output patterns, the test set, is then
fed to the network to determine how it trained. If the new input contains features that
the computing element "recognizes" (that resembles features that it has learned during
training), it responds with an active output. If the new input does not contain features
that the computing element recognizes (e.g. that does not resemble features that it
has learned during training), its response is inhibited (zero). The goal is to ensure that
the training set encompasses the test set.
Additional Considerations
Following are issues that should be considered when designing and using back-propagation artificial neural
networks.
- Expressiveness or How large should the network be?
Neural networks provide attribute representation, not logical representation.
The class of multilayer networks taken together can represent any desired function of a
set of attributes but any particular network may have too few hidden units. So the
question is how many layers and nodes are enough? One source ("Artificial Intelligence,
A Modern Approach", S. J. Russell and P. Novig, Prentice Hall, 1996)
has stated that 2n/n hidden units are needed to represent n Boolean
functions of n inputs. The network would have O(2n) weights.
But, in practice smaller networks have sufficed. Statements have been made that 3
layers, an input layer, hidden layer, and output layer would suffice for most situations.
As mentioned in the description of the backpropagation neural network
one hidden layer
should be able to represent a continuous function, two a discontinuous function.
The question of how many nodes or computing elements are required is not
straightforward. One important point is to use as few hidden nodes (hidden
computing elements) as possible because of computation time and the need for generalization
(see below). A reasonable philosophy is that if a network fails to converge the user should
increase the number of nodes. If the network converges, decrease the number of nodes until
the network fails to converge. In addition, one can selectively remove connections to determine
if certain nodes or links are important.
- Computation time
As mentioned above, the number of hidden nodes directly impacts the computation
time required to train the network. For n examples and W weights, each epoch
takes O(nW) time. The epoch is the number of training sets presented to the network
during each training cycle. It is usually the total number of training sets. A significant
fraction of the computational research effort with respect to feed forward artificial neural
networks has been to design more effective training algorithms that more quickly converge.
Local minima in the error surface can cause convergence to the wrong point much like in
conventional non-linear regression.
- Generalization vs. Memorization
When constructed properly, artificial neural networks generalize nicely. The
concept of generalization is important. Generalization means that given several input-output
combinations all belonging to the same class, the artificial neural network "learns"
(fits) the significant similarities of the input data. It will be able to produce
sensible output to a previously unseen input in the same class. Memorization means that
the network has learned the specific input-output combinations and not the significant
similarities of the input data. The difference may loosely be thought of as learning the
structure of the function y=f(x) (generalization) and not the specific (x,y) pairs used in place
of the function to define it (memorization). In the case of the back-propagation artificial neural network,
the number of connections vs. the number of input-output pairs can have a direct effect on the
ability of the network to generalize vs. memorize. The effect is similar to having too many
constants relative to data pairs in non-linear regression using polynomials. In the absence
of other information, one rule of thumb is that generalization requires that the number of
input-output combinations used in training be 3 to 5 times the total number of connections
in the network.
- Transparency or The Black Box
Back-propagation neural networks are black boxes. An input is fed into them
and an output retrieved using a structure that provides no understanding of why the output
is correct. The network cannot be used to explain the output. Physical meaning cannot be
given to the weights or hidden nodes.
|
Previous Page: The Back-propagation Artificial Neural Network
Next Page: Example of Back-propagation Artificial Neural Network
Return to Table of Contents
|
David C. Silverman, Ph.D. - Primary Consultant
E-Mail: dcsilverman@argentumsolutions.com
Phone: 314-576-3586
Fax: 314-754-9825
Address: The Argentum House
14314 Strawbridge Ct.
Chesterfield, MO 63017
|
|