When I did something like this for Spanish checkers (training a neural network to be the evaluation function in an alpha-beta search, without any human knowledge), I solved the problem of adding game variety by using UCT for the opening moves. That means that I kept a tree structure with the opening moves and I used the UCB1 formula to pick the next move as long as the game was in the tree. Once outside the tree, I used alpha-beta search to play a normal [very fast] game.
One important characteristic of this UCT opening-book builder is that the last move inside the tree is basically random, so this explores a lot of unbalanced positions. Álvaro. On Fri, Oct 20, 2017 at 9:23 AM, Petr Baudis <pa...@ucw.cz> wrote: > I tried to reimplement the system - in a simplified way, trying to > find the minimum that learns to play 5x5 in a few thousands of > self-plays. Turns out there are several components which are important > to avoid some obvious attractors (like the network predicting black > loses on every move from its second game on): > > - disabling resignation in a portion of games is essential not just > for tuning resignation threshold (if you want to even do that), but > just to correct prediction signal by actual scoring rather than > starting to always resign early in the game > > - dirichlet (or other) noise is essential for the network getting > looped into the same game - which is also self-reinforcing > > - i have my doubts about the idea of high temperature move choices > at the beginning, especially with T=1 ... maybe that's just bad > very early in the training > > On Thu, Oct 19, 2017 at 02:23:41PM +0200, Petr Baudis wrote: > > The order of magnitude matches my parameter numbers. (My attempt to > > reproduce a simplified version of this is currently evolving at > > https://github.com/pasky/michi/tree/nnet but the code is a mess right > > now.) > > -- > Petr Baudis, Rossum > Run before you walk! Fly before you crawl! Keep moving forward! > If we fail, I'd rather fail really hugely. -- Moist von Lipwig > _______________________________________________ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go >
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go