Thanks, the dictionary is really great.

Chrilly
  ----- Original Message ----- 
  From: David Silver 
  To: computer-go@computer-go.org 
  Sent: Tuesday, July 03, 2007 11:29 PM
  Subject: [computer-go] Re: Explanation to MoGo paper wanted. (BackGammonCode)


    > It's because Go is not only game in the world and certainly not only
    > reinforcement learning problem. They are using a widely accepted
    > terminology.
    >
    But a very inappropriate one. I have read Suttons book and all the things I 
    know (e.g. TD-Gammon) are completly obfuscated. Its maybe suitable to 
    present generel concepts, but it is extremly complicated to formulate an 
    algorithm in this framework.


  Here is quick and dirty RL<->Computer Go translation kit to try and help 
bridge the gap!


  RL terminology          Go terminology


  State                   Position
  Action                                         Move
  Reward                                         Win/Loss
  Return                                         Win/Loss
  Episode                                 Game
  Time-step                                 One move
  Agent                   Program
  Value function          Evaluation function
  Policy                                         Player
  Default policy          Simulation player
  Uniform random policy   Light simulation player
  Other stochastic policy Heavy simulation player
  Greedy policy           1-ply search player
  Epsilon-greedy policy   1-ply search player with some random moves   
  Feature                                        Factor used for position 
evaluation
  Weight                  Weight of each factor in evaluation function
  Tabular representation  One weight for each complete position
  Partial tabular         UCT tree
      representation
  State abstraction       One weight for many positions
  Linear value function   Evaluation function
      approximation          using weighted sum of various factors
  Feature discovery       Learning new factors for the evaluation function
  Sample-based search     Simulation (Monte-Carlo methods, etc.)
  Transition function     Rules of the game
  Environment             Rules of the game + opponent
  Trajectory              Move sequence
  Online                  During actual play
  Offline                 Before/after actual play (e.g. preprocessing)
  On-policy               If both players play as normal
  Off-policy              If either player behaves differently


  -Dave 




------------------------------------------------------------------------------


  _______________________________________________
  computer-go mailing list
  computer-go@computer-go.org
  http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to