David Silver wrote:
Hi Michael,
But one thing confuses me: You are using the value from Fuego's 10k
simulations as an approximation of the actual value of the position.
But isn't the actual
value of the position either a win or a loss? On such small boards,
can't you assume that Fuego is able to correctly determin who is
winning and round it's
evaluation to the nearest win/loss? i.e. if it evaluates the position
to 0.674, that gets rounded to 1. If such an assumption about Fuego's
ability to read
the position on a small board is valid, then it should improve the
results of your balanced simulation strategy, right? Or am I missing
something?
It's true that 5x5 Go is solved, so in principle we could have used the
true minimax values. However we chose to use an approach that can scale
to larger boards, which means that we should treat the expert
evaluations as approximate. And in fact Fuego was not always accurate on
6x6 boards, as we used only 10k simulations in our training set.
Also, I think it really helps to have "soft" rather than "hard" expert
evaluations. We want a simulation policy that helps differentiate e.g. a
90% winning position from an 85% winning position. Rounding all the
expert evaluations to 0 or 1 would lose much of this advantage.
-Dave
By this argument (your last paragraph), you need to do some magical
number of simulations for the training data. Not enough simulations
and you have too much noise. And infinite simulations gives you hard
0 or 1 results. But I can't argue with your results.
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/