Matthew Lai's paper "Giraffe: Using Deep Reinforcement Learning to Play Chess" has a chapter called "Probabilistic Search". That should be directly applicable?
https://arxiv.org/abs/1509.01549 - Andy 2017-11-16 9:43 GMT-06:00 Petr Baudis <pa...@ucw.cz>: > Hi, > > when explaining AlphaGo Zero to a machine learning audience yesterday > > (https://docs.google.com/presentation/d/ > 1VIueYgFciGr9pxiGmoQyUQ088Ca4ouvEFDPoWpRO4oQ/view) > > it occurred to me that using MCTS in this setup is actually such > a kludge! > > Originally, we used MCTS because with the repeated simulations, > we would be improving the accuracy of the arm reward estimates. MCTS > policies assume stationary distributions, which is violated every time > we expand the tree, but it's an okay tradeoff if all you feed into the > tree are rewards in the form of just Bernoulli trials. Moreover, you > could argue evaluations are somewhat monotonic with increasing node > depths as you are basically just fixing a growing prefix of the MC > simulation. > > But now, we expand the nodes literally all the time, breaking the > stationarity possibly in drastic ways. There are no reevaluations that > would improve your estimate. The input isn't binary but an estimate in > a continuous space. Suddenly the Multi-armed Bandit analogy loses a lot > of ground. > > Therefore, can't we take the next step, and do away with MCTS? Is > there a theoretical viewpoint from which it still makes sense as the best > policy improvement operator? > > What would you say is the current state-of-art game tree search for > chess? That's a very unfamiliar world for me, to be honest all I really > know is MCTS... > > -- > Petr Baudis, Rossum > Run before you walk! Fly before you crawl! Keep moving forward! > If we fail, I'd rather fail really hugely. -- Moist von Lipwig > _______________________________________________ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go