At first, I was cool with the whole "mathematical notation is more general" argument. But the fact of the matter is, these results don't hold water in a general sense. They only hold water in the environment that they were tested -- Computer Go. Seems like it should be up to the person in the other environment to adapt your successful algorithm (and notation/terminology) to their environment.
On 7/4/07, chrilly <[EMAIL PROTECTED]> wrote:
Thanks, the dictionary is really great. Chrilly ----- Original Message ----- From: David Silver To: computer-go@computer-go.org Sent: Tuesday, July 03, 2007 11:29 PM Subject: [computer-go] Re: Explanation to MoGo paper wanted. (BackGammonCode) > It's because Go is not only game in the world and certainly not only > reinforcement learning problem. They are using a widely accepted > terminology. > But a very inappropriate one. I have read Suttons book and all the things I know (e.g. TD-Gammon) are completly obfuscated. Its maybe suitable to present generel concepts, but it is extremly complicated to formulate an algorithm in this framework. Here is quick and dirty RL<->Computer Go translation kit to try and help bridge the gap! RL terminology Go terminology State Position Action Move Reward Win/Loss Return Win/Loss Episode Game Time-step One move Agent ProgramValue function Evaluation function Policy Player Default policy Simulation player Uniform random policy Light simulation player Other stochastic policy Heavy simulation player Greedy policy 1-ply search player Epsilon-greedy policy 1-ply search player with some random moves Feature Factor used for position evaluation Weight Weight of each factor in evaluation function Tabular representation One weight for each complete position Partial tabular UCT tree representation State abstraction One weight for many positions Linear value function Evaluation function approximation using weighted sum of various factors Feature discovery Learning new factors for the evaluation function Sample-based search Simulation (Monte-Carlo methods, etc.) Transition function Rules of the game Environment Rules of the game + opponent Trajectory Move sequence Online During actual play Offline Before/after actual play (e.g. preprocessing) On-policy If both players play as normal Off-policy If either player behaves differently -Dave ________________________________ _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/