David Silver wrote:
Hi Michael,

But one thing confuses me: You are using the value from Fuego's 10k simulations as an approximation of the actual value of the position. But isn't the actual value of the position either a win or a loss? On such small boards, can't you assume that Fuego is able to correctly determin who is winning and round it's evaluation to the nearest win/loss? i.e. if it evaluates the position to 0.674, that gets rounded to 1. If such an assumption about Fuego's ability to read the position on a small board is valid, then it should improve the results of your balanced simulation strategy, right? Or am I missing something?

It's true that 5x5 Go is solved, so in principle we could have used the true minimax values. However we chose to use an approach that can scale to larger boards, which means that we should treat the expert evaluations as approximate. And in fact Fuego was not always accurate on 6x6 boards, as we used only 10k simulations in our training set.

Also, I think it really helps to have "soft" rather than "hard" expert evaluations. We want a simulation policy that helps differentiate e.g. a 90% winning position from an 85% winning position. Rounding all the expert evaluations to 0 or 1 would lose much of this advantage.

-Dave

By this argument (your last paragraph), you need to do some magical
number of simulations for the training data.  Not enough simulations
and you have too much noise.  And infinite simulations gives you hard
0 or 1 results.  But I can't argue with your results.

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to