That translates to mean that MoGo no longer uses upper confidence
bounds, and only uses means. It also means that MoGo will _never_
explore improbable children (after a few sims) unless the RAVE value
yields an unusually high estimate for it. Is all of that correct?
Precisely: I don't see why you would be wrong, but empirically for 9x9,
we have played games against high-level humans and for the (few :-) )
games that mogo lost, we tried to see which moves were erroneously chosen
by mogo; if we restart mogo at the same position with a huge
computation time (30 minutes of a fast octocore) mogo always changed his
mind and moves to a better move.
So:
- theoretically, I don't see any reason for mogo to be asymptotically
consistent
- there are long computation times during which mogo focuses on a bad
move
- however, we have not seen a case of bad move for which mogo keeps
this move in case of _very_ long computation times
==> if someone beats the release MoGoR3 with
very large computation times (time x nbcores = 4h, 1 to 4 cores)
I'm interested in the sgf file and the analysis
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/