Also, the control theoretic stuff was removed because I am unable to define the reward based on the current state in a *differentiable* way. For example, in the game of chess, the reward comes only when checkmate occurs (according to the game's official rules), but not when you capture a piece of high value (eg xQueen). This problem is known as "sparse reward" vs "dense reward" in reinforcement learning: [image: Screenshot from 2019-05-07 20-49-22.png] The actual reward is a delta-function occurring at the end of the game. "Classical" control theory is applicable only when the reward is something like the dotted line.
------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T3cad55ae5144b323-M2216896a5f7ba4151b51a14c Delivery options: https://agi.topicbox.com/groups/agi/subscription
