At first, I was cool with the whole "mathematical notation is more
general" argument.  But the fact of the matter is, these results don't
hold water in a general sense.  They only hold water in the
environment that they were tested -- Computer Go.  Seems like it
should be up to the person in the other environment to adapt your
successful algorithm (and notation/terminology) to their environment.


On 7/4/07, chrilly <[EMAIL PROTECTED]> wrote:


Thanks, the dictionary is really great.

Chrilly

----- Original Message -----
From: David Silver
To: computer-go@computer-go.org
Sent: Tuesday, July 03, 2007 11:29 PM
Subject: [computer-go] Re: Explanation to MoGo paper wanted.
(BackGammonCode)




> It's because Go is not only game in the world and certainly not only
> reinforcement learning problem. They are using a widely accepted
> terminology.
>
But a very inappropriate one. I have read Suttons book and all the things I
know (e.g. TD-Gammon) are completly obfuscated. Its maybe suitable to
present generel concepts, but it is extremly complicated to formulate an
algorithm in this framework.

Here is quick and dirty RL<->Computer Go translation kit to try and help
bridge the gap!


RL terminology          Go terminology


State                   Position
Action                                         Move
Reward                                         Win/Loss
Return                                         Win/Loss
Episode                                 Game
Time-step                                 One move
Agent                   ProgramValue function          Evaluation function
Policy                                         Player
Default policy          Simulation player
Uniform random policy   Light simulation player
Other stochastic policy Heavy simulation player
Greedy policy           1-ply search player
Epsilon-greedy policy   1-ply search player with some random moves
Feature                                        Factor used for position
evaluation
Weight                  Weight of each factor in evaluation function
Tabular representation  One weight for each complete position
Partial tabular         UCT tree
    representation
State abstraction       One weight for many positions
Linear value function   Evaluation function
    approximation          using weighted sum of various factors
Feature discovery       Learning new factors for the evaluation function
Sample-based search     Simulation (Monte-Carlo methods, etc.)
Transition function     Rules of the game
Environment             Rules of the game + opponent
Trajectory              Move sequence
Online                  During actual play
Offline                 Before/after actual play (e.g. preprocessing)
On-policy               If both players play as normal
Off-policy              If either player behaves differently


-Dave



 ________________________________


_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to