Derivation of Backpropagation

Notes:
  1. The error for a pattern p is half of the sum of the squares of the deviations of the activation of each output unit j from the target for that unit
  2. The change in the weight on the connection from unit i to unit j for pattern p should be proportional to the negative of the slope of the error with respect to that weight. Eta is the learning rate.
  3. Using the chain rule, we can express the partial derivative as a product of partial derivatives. Ipj is the input to unit j for pattern p.
  4. Using the formula for input, the second derivative in (3) simplifies to the activation of unit i.
  5. We define delta for pattern p and unit j thus.
  6. We can then express the weight change in terms of the delta for the destination unit on the connection, and we only need to figure out how to calculate delta for different units.
  7. Using the chain rule again, we express delta as a product of partial derivatives.
  8. The second of the derivatives in (7) is just the derivative of f, the activation function for the unit.
  9. For the first derivative in (7), there are two cases: where j is an output unit and where it is a hidden unit. For the output unit case, the derivative is just the difference of the activation and target.
  10. This is (7) for output units.
  11. For hidden units, we can use the chain rule once more to express the first derivative in (7) as a sum of products of partial derivatives. k is a unit in the layer above the layer in which unit j occurs. Using (5), the expression simplifies to the negative of the sum of the products of the deltas of the units in the layer above j and the weights connecting j to those units.
  12. This is (7) for hidden units.


[<-] Back to Connectionism: Backpropagation


[IU Bloomington] [IU Cognitive Science] [Q351]

Last updated: 20 March 1996
URL: http://www.indiana.edu/~gasser/Q351/bp_derivation.html
Comments: gasser@salsa.indiana.edu
Copyright 1996, The Trustees of Indiana University