Re: [Computer-go] AlphaGo won first game!

valkyria Wed, 09 Mar 2016 09:54:07 -0800

Hi!

I think the technique of hashing move pairs from the search tree andreuse them inthe playouts if the context matches, could plausibly be the majorimprovement of Alphago that

we witnessed today.

Another thing I noticed is that Alphago does not use any statistics fromthe playouts other than wins and losses.I think that is a improvement because it removes biases that might makethe program weak in certain situations.

On the other hand using the powerful move prediction accuracy from thetree search and reuse that information inplayouts could really solve a lot of problems. In the way I see itAlphago injects go knowledge into the playouts fro mthe search tree.Traditional monte carlo programs would do the opposite. Add a lot ofknowledge to playouts and then try to squeeze

out as much statistics as possible.

Also the clever thing is that the playouts get fed knowledge from bothoffline and online sources. When the search startsthe move ordering of the neural networks will help playouts withsuggestion of good shape moves. But as more and morelocal situations are read out somewhere in the search tree, the playoutswill pick up more and more strong moves from the hashtable.

By the way I will start experiments with this soon in my new program!...:-)


Best
Magnus Persson

On 2016-03-09 18:11, Petr Baudis wrote:

Hi!

On Wed, Mar 09, 2016 at 04:43:23PM +0900, Hiroshi Yamashita wrote:

AlphaGo won 1st game against Lee Sedol!


  Well, I have to eat my past words - of course, there are still four
games to go, but the first round does not look like a lucky win at all!

  Huge congratulations to the AlphaGo team, you have done truly amazing
work, with potential to spearhead a lot of further advances in AI in
general!  It does seem to me that you must have made a lot of progress
since the Nature paper though - is that impression correct?

Do you have some more surprising breakthroughs and techniques instore

for us, or was the progress mainly incremental, furthering the training
etc.?


  By the way, there is a short snippet in the paper that maybe many
people overlooked (including me on the very first read!):

  We introduce a new technique that caches all moves from the search
tree and then plays similar moves during rollouts; a generalisation of

the last good reply heuristic. At every step of the tree traversal,the

most probable action is inserted into a hash table, along with the

3 × 3 pattern context (colour, liberty and stone counts) around boththe

previous move and the current move. At each step of the rollout, the
pattern context is matched against the hash table; if a match is found
then the stored move is played with high probability.


  This looks like it might overcome a lot of weaknesses re semeai etc.,
enabling the coveted (by me) information flow from tree to playouts, if
you made this to work well (it's similar to my "liberty maps" attempts,
which always failed though - I tried to encode a larger context, which
maybe wasn't good idea).

  Would you say this improvement is important to AlphaGo's playing
strength (or its scaling), or merely a minor tweak?


  Thanks,


_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo won first game!

Reply via email to