Derivation of Backpropagation
Notes:
- The error for a pattern p is half of the sum of the squares
of the deviations of the activation of each output unit j from
the target for that unit
- The change in the weight on the connection from unit i to
unit j for pattern p should be proportional to the negative of
the slope of the error with respect to that weight.
Eta is the learning rate.
- Using the chain rule, we can express the partial derivative as a
product of partial derivatives. Ipj is the input
to unit j for pattern p.
- Using the formula for input, the second derivative in (3) simplifies
to the activation of unit i.
- We define delta for pattern p and unit j thus.
- We can then express the weight change in terms of the delta
for the destination unit on the connection,
and we only need to figure out how to calculate delta for
different units.
- Using the chain rule again, we express delta as a product
of partial derivatives.
- The second of the derivatives in (7) is just the derivative of
f, the activation function for the unit.
- For the first derivative in (7), there are two cases: where j
is an output unit and where it is a hidden unit.
For the output unit case, the derivative is just the difference of
the activation and target.
- This is (7) for output units.
- For hidden units, we can use the chain rule once more to express
the first derivative in (7) as a sum of products of partial derivatives.
k is a unit in the layer above the layer in which
unit j occurs.
Using (5), the expression simplifies to the negative of the sum of
the products of the deltas of the units in the layer above
j and the weights connecting j to those units.
- This is (7) for hidden units.
Back to Connectionism: Backpropagation
Last updated: 20 March 1996
URL: http://www.indiana.edu/~gasser/Q351/bp_derivation.html
Comments:
gasser@salsa.indiana.edu
Copyright 1996,
The Trustees of
Indiana University