> It's because Go is not only game in the world and certainly not only
> reinforcement learning problem. They are using a widely accepted
> terminology.
>
But a very inappropriate one. I have read Suttons book and all the things I
know (e.g. TD-Gammon) are completly obfuscated. Its maybe suitable to
present generel concepts, but it is extremly complicated to formulate an
algorithm in this framework.

Here is quick and dirty RL<->Computer Go translation kit to try and help bridge the gap!

RL terminology          Go terminology

State                   Position
Action                                          Move
Reward                                          Win/Loss
Return                                          Win/Loss
Episode                                 Game
Time-step                                       One move
Agent                   Program
Value function          Evaluation function
Policy                                          Player
Default policy          Simulation player
Uniform random policy   Light simulation player
Other stochastic policy Heavy simulation player
Greedy policy           1-ply search player
Epsilon-greedy policy   1-ply search player with some random moves
Feature Factor used for position evaluation
Weight                  Weight of each factor in evaluation function
Tabular representation  One weight for each complete position
Partial tabular         UCT tree
    representation
State abstraction       One weight for many positions
Linear value function   Evaluation function
    approximation          using weighted sum of various factors
Feature discovery       Learning new factors for the evaluation function
Sample-based search     Simulation (Monte-Carlo methods, etc.)
Transition function     Rules of the game
Environment             Rules of the game + opponent
Trajectory              Move sequence
Online                  During actual play
Offline                 Before/after actual play (e.g. preprocessing)
On-policy               If both players play as normal
Off-policy              If either player behaves differently

-Dave

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to