Quoting Mark Boon <[EMAIL PROTECTED]>:
What is not exactly clear to me is what you mean by 'postponing expansion'. Let me write it in my own words to see if that's what you mean. When you have selected a best node based on the UCT + wins/visits value which has no children yet, you first simply do a simulation and collect the playout result in the current node, including the AMAF value that you call 'virtual win-visit ratio, and only when that is done a certain number of times (in your case 10) do you suddenly create all the children and weight them based on the virtual win-visit ration and possibly weight them based on other move-priorities that resulted from 'heavy' playout selection?
Yes I "suddenly" create all children, but at the creation I have no simulation and thus no virtual win-visit ratios for the children (although one might copy such values from higher up in the tree, which I think the Mogo team tried but with little or no success). The virtual win-visit ratios are initialized to some default value. But one can initialize the ratios differently depending on static evaluation the position using patterns for example. Or the proximity heuristic. It is not clear to me how to best do this. And I need to test the parameters for biasing. The nice thing is that one can initialize the virtual win-visits ratios and keep real win-visits ratios unbiased. You can afford to make mistakes here because if the position is searched a lot the virtual values get data much quicker than the real ones.
Magnus _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/