Also, the control theoretic stuff was removed because I am unable to define
the reward based on the current state in a *differentiable* way.
For example, in the game of chess, the reward comes only when checkmate
occurs (according to the game's official rules), but not when you capture a
piece of high value (eg  xQueen).  This problem is known as "sparse reward"
vs "dense reward" in reinforcement learning:
[image: Screenshot from 2019-05-07 20-49-22.png]
The actual reward is a delta-function occurring at the end of the game.
"Classical" control theory is applicable only when the reward is something
like the dotted line.

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T3cad55ae5144b323-M2216896a5f7ba4151b51a14c
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to