In this chapter we consider methods for learning the policy parameter
based on the gradient of some performance measure with respect to the policy
parameter. These methods seek to maximize performance, so their updates
approximate gradient ascent in :