Actually this pretty much solves the whole issue right? Of course the proof would be to actually test it out, but it seems to me a pretty straightforward solution, not nontrivial at all.
On Feb 13, 2018 10:52 AM, "David Wu" <lightvec...@gmail.com> wrote: Seems to me like you could fix that in the policy too by providing an input feature plane that indicates the value of a draw, whether 0 as normal, or -1 for must-win, or -1/3 for 3/1/0, or 1 for only-need-not-lose, etc. Then just play games with a variety of values for this parameter in your self-play training pipeline so the policy net gets exposed to each kind of game. On Feb 13, 2018 10:40 AM, "Dan Schmidt" <d...@dfan.org> wrote: The AlphaZero paper says that they just assign values 1, 0, and -1 to wins, draws, and losses respectively. This is fine for maximizing your expected value over an infinite number of games given the way that chess tournaments (to pick the example that I'm familiar with) are typically scored, where you get 1, 0.5, and 0 points respectively for wins, draws, and losses. However 1) not all tournaments use this scoring system (3/1/0 is popular these days, to discourage draws), and 2) this system doesn't account for must-win situations where a draw is as bad as a loss (say you are 1 point behind your opponent and it's the last game of a match). Ideally you'd keep track of all three probabilities and use some linear meta-scoring function on top of them. I don't think it's trivial to extend the AlphaZero architecture to handle this, though. Maybe it is sufficient to train with the standard meta-scoring (while keeping track of the separate W/D/L probabilities) but then use the currently applicable meta-scoring while playing. Your policy network won't quite match your current situation, but at least your value network and search will. On Tue, Feb 13, 2018 at 10:05 AM, "Ingo Althöfer" <3-hirn-ver...@gmx.de> wrote: > Hello, > > what is known about proper MCTS procedures for games > which do not only have wins and losses, but also draws > (like chess, Shogi or Go with integral komi)? > > Should neural nets provide (win, draw, loss)-probabilities > for positions in such games? > > Ingo. > _______________________________________________ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go