On Sun, May 12, 2019 at 10:22 PM Sergio VM <[email protected]> wrote:
> Hi King Yin, > > The architecture looks very interesting. I am just missing the definition > of the reward function (or kernel if you make it stochastic). > > On the other hand, I don't understand your previous comment on the > Lagrangian and Hamiltonian. I haven't seen the previous version of the > paper. But you can apply an optimal control approach without having to > consider the velocity at all. > The reward function is given externally by some "AI teachers". For example, rewards given by an Atari game. The Lagrangian is the same as the instantaneous reward the system gets at time t. In some cases, such as chess, the reward is just a delta function given at the terminal state (eg checkmate). The Bellman equation (for dynamic programming) always works, no matter how the rewards are given. The Hamiltonian / Lagrangian control theory may also work if the Lagrangian is given as some delta functions, but in such a case, the solution of the differential equation would involve a discretization process that simply reduces to the discrete dynamic programming case. In other words, the use of differential equations has no advantage over the discrete case! Things would be different if the reward (Lagrangian) is differentiable against the position x and velocity x dot. But such is not the case for some real-life problems such as logic puzzles or chess games, in which the reward only occurs sparsely. Hope it answers your question?☺ I will re-organize the material in the older version and post it somewhere, just so the work is not wasted. But I don't see any easy way to bridge that gap. It doesn't seem to be a good idea to temper with the reward function, other than the way it is given by the problem setup.... ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T3cad55ae5144b323-M786f293ff3d94b5c7bbc2660 Delivery options: https://agi.topicbox.com/groups/agi/subscription
