Hi!
I think the technique of hashing move pairs from the search tree and
reuse them in
the playouts if the context matches, could plausibly be the major
improvement of Alphago that
we witnessed today.
Another thing I noticed is that Alphago does not use any statistics from
the playouts other than wins and losses.
I think that is a improvement because it removes biases that might make
the program weak in certain situations.
On the other hand using the powerful move prediction accuracy from the
tree search and reuse that information in
playouts could really solve a lot of problems. In the way I see it
Alphago injects go knowledge into the playouts fro mthe search tree.
Traditional monte carlo programs would do the opposite. Add a lot of
knowledge to playouts and then try to squeeze
out as much statistics as possible.
Also the clever thing is that the playouts get fed knowledge from both
offline and online sources. When the search starts
the move ordering of the neural networks will help playouts with
suggestion of good shape moves. But as more and more
local situations are read out somewhere in the search tree, the playouts
will pick up more and more strong moves from the hashtable.
By the way I will start experiments with this soon in my new program!...
:-)
Best
Magnus Persson
On 2016-03-09 18:11, Petr Baudis wrote:
Hi!
On Wed, Mar 09, 2016 at 04:43:23PM +0900, Hiroshi Yamashita wrote:
AlphaGo won 1st game against Lee Sedol!
Well, I have to eat my past words - of course, there are still four
games to go, but the first round does not look like a lucky win at all!
Huge congratulations to the AlphaGo team, you have done truly amazing
work, with potential to spearhead a lot of further advances in AI in
general! It does seem to me that you must have made a lot of progress
since the Nature paper though - is that impression correct?
Do you have some more surprising breakthroughs and techniques in
store
for us, or was the progress mainly incremental, furthering the training
etc.?
By the way, there is a short snippet in the paper that maybe many
people overlooked (including me on the very first read!):
We introduce a new technique that caches all moves from the search
tree and then plays similar moves during rollouts; a generalisation of
the last good reply heuristic. At every step of the tree traversal,
the
most probable action is inserted into a hash table, along with the
3 × 3 pattern context (colour, liberty and stone counts) around both
the
previous move and the current move. At each step of the rollout, the
pattern context is matched against the hash table; if a match is found
then the stored move is played with high probability.
This looks like it might overcome a lot of weaknesses re semeai etc.,
enabling the coveted (by me) information flow from tree to playouts, if
you made this to work well (it's similar to my "liberty maps" attempts,
which always failed though - I tried to encode a larger context, which
maybe wasn't good idea).
Would you say this improvement is important to AlphaGo's playing
strength (or its scaling), or merely a minor tweak?
Thanks,
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go