Reinforcement Learning terminology :-)
In Go the state is the board situation (stones, player to move, ko
info, etc.), the action is simply the move. Together they form
state-action pairs.
A standard transposition table typically only has state values; action
values can then be inferred from a on
On 27-okt-08, at 12:45, Erik van der Werf wrote:
Using state-action values
appears to solve the problem.
What are 'state-action values'?
Mark
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/comp
[EMAIL PROTECTED] wrote on 27-10-2008 14:57:54:
>
> On 27-okt-08, at 11:51, terry mcintyre wrote:
>
> > - Original Message
> >
> >> From: Mark Boon <[EMAIL PROTECTED]>
> >
> >
> >> Let me first describe what I did (ar attempted to do): all nodes are
> >> stored in a hash-table using a
When a child has been sampled often through some other path a naive
implementation may initially explore other less frequently visited
children first. The new path leading to the transposition may
therefore suffer from some initial bias. Using state-action values
appears to solve the problem.
Erik
On 27-okt-08, at 11:51, terry mcintyre wrote:
- Original Message
From: Mark Boon <[EMAIL PROTECTED]>
Let me first describe what I did (ar attempted to do): all nodes are
stored in a hash-table using a checksum. Whenever I create a new node
in the tree I add it in the hash-table a
- Original Message
> From: Mark Boon <[EMAIL PROTECTED]>
> Let me first describe what I did (ar attempted to do): all nodes are
> stored in a hash-table using a checksum. Whenever I create a new node
> in the tree I add it in the hash-table as well. If two nodes have the
> same c
A while ago I implemented what I thought was a fairly straightforward
way to deal with transpositions. But to my surprise it made the
program weaker instead of stronger. Since I couldn't figure out
immediately what was wrong with it, I decided to leave it alone for
the time being.
Just no