Hi!

I think the technique of hashing move pairs from the search tree and reuse them in the playouts if the context matches, could plausibly be the major improvement of Alphago that
we witnessed today.

Another thing I noticed is that Alphago does not use any statistics from the playouts other than wins and losses. I think that is a improvement because it removes biases that might make the program weak in certain situations.

On the other hand using the powerful move prediction accuracy from the tree search and reuse that information in playouts could really solve a lot of problems. In the way I see it Alphago injects go knowledge into the playouts fro mthe search tree. Traditional monte carlo programs would do the opposite. Add a lot of knowledge to playouts and then try to squeeze
out as much statistics as possible.

Also the clever thing is that the playouts get fed knowledge from both offline and online sources. When the search starts the move ordering of the neural networks will help playouts with suggestion of good shape moves. But as more and more local situations are read out somewhere in the search tree, the playouts will pick up more and more strong moves from the hashtable.

By the way I will start experiments with this soon in my new program!... :-)

Best
Magnus Persson

On 2016-03-09 18:11, Petr Baudis wrote:
Hi!

On Wed, Mar 09, 2016 at 04:43:23PM +0900, Hiroshi Yamashita wrote:
AlphaGo won 1st game against Lee Sedol!

  Well, I have to eat my past words - of course, there are still four
games to go, but the first round does not look like a lucky win at all!

  Huge congratulations to the AlphaGo team, you have done truly amazing
work, with potential to spearhead a lot of further advances in AI in
general!  It does seem to me that you must have made a lot of progress
since the Nature paper though - is that impression correct?

Do you have some more surprising breakthroughs and techniques in store
for us, or was the progress mainly incremental, furthering the training
etc.?


  By the way, there is a short snippet in the paper that maybe many
people overlooked (including me on the very first read!):

  We introduce a new technique that caches all moves from the search
tree and then plays similar moves during rollouts; a generalisation of
the last good reply heuristic. At every step of the tree traversal, the
most probable action is inserted into a hash table, along with the
3 × 3 pattern context (colour, liberty and stone counts) around both the
previous move and the current move. At each step of the rollout, the
pattern context is matched against the hash table; if a match is found
then the stored move is played with high probability.

  This looks like it might overcome a lot of weaknesses re semeai etc.,
enabling the coveted (by me) information flow from tree to playouts, if
you made this to work well (it's similar to my "liberty maps" attempts,
which always failed though - I tried to encode a larger context, which
maybe wasn't good idea).

  Would you say this improvement is important to AlphaGo's playing
strength (or its scaling), or merely a minor tweak?


  Thanks,

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to