Łukasz Lew wrote:
On Wed, Jun 3, 2009 at 00:56, Michael Williams
<michaelwilliam...@gmail.com> wrote:
Two things: Firstly, I'm storing (only in RAM) the precalculated Winrate
and InvSqrtVisits and keeping them updated.
So my UCT formula went from
Wins / Visits + sqrt(lnParentVisits / Visits)
to
WinRate + sqrtLnParentVisits * InvSqrtVisits;
This has a memory cost, but I don't care so much about RAM since I can send
the nodes to disk.
And the second thing is to store in the parent node a reference to what is
likely the UCT-best child node. If the parent has been visited
100*boardspaces times, I will go directly to the likely-best child with
probability 2047/2048. Anytime a proper UCT loop occurs, the likely-best
reference is updated (about 90% of the time there is no change, so I think
it's safe).
This is quite similar to epsilon trick described here:
http://www.mimuw.edu.pl/~lew/files/epsilon_trick.pdf
in short when you calculate best UCT child you visit it
max(best.visit_count * epsilon, 1) times
with epsilon = 0.05 for instance
It works well both for new and old nodes, but you have to keep the
counter of visits.
The soft way would be to recalculate best child with probability
min(1, 1/(best.visit_count*epsilon)).
Both variants of ET can give you some guarantees about the way the
tree is explored.
Łukasz
I switched to this method. Thanks, Lukasz.
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/