Yes, the Bellman equation and the Hamilton-Jacobi equation are just
discrete and continuous versions of essentially the same idea.  They
are now jointly referred to as the HJB equation.  I merely explored
what has been known in the literature, to see if there is any
advantage to be gained in the continuous version.

The optimality condition is needed for updating the state, so it is of
central importance.  It's just that the Bellman equation seems to
suffice in the AGI setting and there's (apparently) no need to
introduce the HJ or Euler-Lagrange equations.

As I learned more mathematics, it becomes a habit to look for remote
connections.  There is also a connection from logic --> term rewriting
systems --> polynomials --> Grobner bases etc.

Quite often these connections do not give mathematicians useful
results, but they increase our understanding of the problems.  Only
rarely do they lead to significant breakthroughs.

Once I was sitting at a seminar on quantum information theory, and a
European guy at the back raised a question, asking if there's a
connection of the speaker's topic with the Riemann hypothesis.  That
was really shocking to me as I could not see any relation before he
asked.  I didn't know what he was talking about but since then I have
become more used to such distant connections.

On 5/19/19, Sergio VM <[email protected]> wrote:
> In order to understand the problem well, suppose that everything is known,
> reward, state and action space, transitions. Why do you need to formulate
> Hamitonian formulation? I am missing this point.
> Do you need an optimality condition? If so, since you are working on
> discrete state-action sets, the dynamic programming (Bellman) equation
> gives you the optimality condition. On the other hand, if your state-action
> sets were continuous, then you could use Euler-Lagrange, Pontryagin, or
> HJB. But why do you need such optimality condition?
>
> On Thu, May 16, 2019 at 5:45 AM YKY (Yan King Yin, 甄景贤) <
> [email protected]> wrote:
>
>> On Wed, May 15, 2019 at 3:42 AM Sergio VM <[email protected]> wrote:
>>
>>> Not sure if I am following you...
>>>
>>> In order to define the optimal control problem, you need:
>>>
>>>    - State set: Set of all possible logic propositions. OK
>>>    - Action set: Logic rules. It is not clear to me what this means. Can
>>>    you choose which logic rule to use with which proposition? I mean,
>>> the
>>>    actions should be chosen by the agent (there may be constraints on
>>> which
>>>    actions are available at each state, but there might be some freedom
>>>    nonetheless, otherwise, there wouldn't be anything to be learned).
>>>    - Expected reward function: some map from the state and action sets
>>>    to the reals. You want it to be non-smooth. OK.
>>>    - Transition kernel: represents the knowledge. Very interesting.
>>>
>>> So let me try to understand with an example:
>>>
>>>    - State at time t is a bunch of propositions', e.g. x_t = {"I am in
>>>    my place", "my place is in Europe"}
>>>    - Action at time t is a particular logic rule, e.g. a_t = { "if p -->
>>>    q and  q-->r , then p-->r"}
>>>    - State transition: x_{t+1} = F(x_t, a_t) = if "I am in my place" and
>>>    "my place is in Europe", then "I am in Europe"
>>>    - Reward: something saying that this new state is desirable, makes
>>>    sense, etc.
>>>
>>> Is this correct?
>>>
>>
>> Yes, except that the reward may be zero, as when you're planning a
>> sequence of chess moves.  Your reward only comes when you win / lose /
>> draw
>> the game.  If you "feel" that your chess moves make sense / are good,
>> they
>> are your *internal* assessment of of the desirability / utility / value
>> of those actions, but that is not the external *reward*.
>>
>> If I understand correctly, "desirability" is the utility value (V) or Q
>> value (which is V given a certain action).  In physics it is called the
>> "action" with unit [energy x time] (not to be confused with action in
>> reinforcement learning).
>>
>> V (utility value) is *learned*, R (reward) is *given* by the problem.
>>
>> I am definitely lost with your comment about the Hamiltonian. I am
>>> familiar with optimal control theory, but I don't see the story... In
>>> general you don't need the velocity. What you need is an optimality
>>> condition, which doesn't have to be related to any time derivative.
>>> Think,
>>> e.g. about the Euler-Lagrange condition by deriving the reward function
>>> with respect to the current and future states and with respect to the
>>> action. It can be formulated even in discrete time.
>>>
>>
>>
>> In the *unconstrained* optimization problem setting, the action can
>> potentially move the state in *any* direction, thus the action is defined
>> as the same as the velocity.  The problem with sparse reward is that we
>> have neither ∂L/∂ẋ nor ∂L/∂x (except as a delta function at the terminal
>> state).
>> *Artificial General Intelligence List <https://agi.topicbox.com/latest>*
>> / AGI / see discussions <https://agi.topicbox.com/groups/agi> +
>> participants <https://agi.topicbox.com/groups/agi/members> + delivery
>> options <https://agi.topicbox.com/groups/agi/subscription> Permalink
>> <https://agi.topicbox.com/groups/agi/T3cad55ae5144b323-Mf2c0d49874614a5a584c299e>
>>


-- 
*YKY*
*"The ultimate goal of mathematics is to eliminate any need for intelligent
thought"* -- Alfred North Whitehead

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T3cad55ae5144b323-Mcc8e23a2e09deb7aaf2380a1
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to