The algorithms presented there continue to the end of the game tree, no? (I am 
looking specifically at Algorithm 12, which seems to be the paper’s recommended 
algorithm.)

 

I am trying to get my head around what happens in practical settings, were we 
decide to limit search effort. E.g., if the search is limited by depth, then I 
think the UCT bounds that are present as tiebreakers would seldom be used? 
(Because moves at the root would not have “infinities.”) That is, the approach 
would simplify to MT-SSS*.

 

It is my impression that iterative-deepening/aspiration-window is the winner 
among the alpha-beta family of algorithms, so I am curious to learn how 
algorithm 12 compares. The problem is that iterative-deepening and 
aspiration-window are hard to analyze. SSS* is more amenable to proofs. A 
comparison to SSS* is maybe not relevant.

 

Another aspect that I am trying to understand is whether the associative memory 
is really an improvement over a transposition table. Specifically: Algorithm 12 
says “traverse the move that has the highest upper bound”, where a chess 
program would have stored a move in the xref table.

 

All in all, if I wanted to use alpha-beta in a Go program, then I would first 
train a value/priority network as described in the AGZ/AZ papers. Then use it 
in an alpha-beta program and start researching game-specific search techniques. 
Singular extensions, late-move-reductions, null-move, …

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of Dan
Sent: Tuesday, March 6, 2018 5:55 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] 9x9 is last frontier?

 

Alpha-beta rollouts is like MCTS without playouts (as in AlphaZero), and 
something that can also do alpha-beta pruning.

 

With standard MCTS, the tree converges to a minmax tree not an alpha-beta tree, 
so as you know there is a huge branching factor difference there.

 

For MCTS to become competitive with alpha-beta, one has to use either 
alpha-beta rollouts OR as in AlphaZero's case

pick accurately, say the best 3 moves with its PUCT bias, and then search those 
with MCTS that would converge to minimax (not alpha-beta).

 

I went for the latter (recasting alphabeta in rollouts form which can be 
combined with standard MCTS as well as standard recursive alpha-beta 
implementation)

 

The paper is here 
https://www.microsoft.com/en-us/research/wp-content/uploads/2014/11/huang_rollout.pdf

 

If it was upto me, I would have used standard alpha-beta search with 
policy/value networks to improve move sorting and evaluation resp.

 

It seems AlphaZero's MCTS approach hinged on the policy network accurately 
picking the best n moves (while still using minmax) AND then

ofcourse not making blunders. I think what "shallow traps" are some bad looking 
moves turning out to be brilliant tactically later. How the policy

nework able to identify those moves ? A full width like alpha-beta does not 
miss those.

 

Daniel

 

 

 

On Tue, Mar 6, 2018 at 1:23 PM, Álvaro Begué <alvaro.be...@gmail.com 
<mailto:alvaro.be...@gmail.com> > wrote:

Sorry, I haven't been paying enough attention lately to know what "alpha-beta 
rollouts" means precisely. Can you either describe them or give me a reference?

Thanks,
Álvaro.



 

On Tue, Mar 6, 2018 at 1:49 PM, Dan <dsha...@gmail.com 
<mailto:dsha...@gmail.com> > wrote:

I did a quick test with my MCTS chess engine wth two different implementations.

A standard MCTS with averaging, and MCTS with alpha-beta rollouts. The result 
is like a 600 elo difference

 

Finished game 44 (scorpio-pmcts vs scorpio-mcts): 1/2-1/2 {Draw by 3-fold 
repetition}

Score of scorpio-mcts vs scorpio-pmcts: 41 - 1 - 2  [0.955] 44

Elo difference: 528.89 +/- nan

 

scorpio-mcts uses alpha-beta rollouts

scorpio-pmcts is "pure" mcts with averaging and UCB formula.

 

Daniel

 

On Tue, Mar 6, 2018 at 11:46 AM, Dan <dsha...@gmail.com 
<mailto:dsha...@gmail.com> > wrote:

I am pretty sure it is an MCTS problem and I suspect not something that could 
be easily solved with a policy network (could be wrong hree). My opinon is that 
DCNN is not

a miracle worker (as somebody already mentioned here) and it is going to fail  
resolving tactics.  I would be more than happy with it if it has same power as 
a qsearch to be honest.

 

Search traps are the major problem with games like Chess, and what makes 
transitioning the success of DCNN from Go to Chess non trivial.

The following paper discusses shallow traps that are prevalent in chess. ( 
https://www.aaai.org/ocs/index.php/ICAPS/ICAPS10/paper/download/1458/1571 )

They mention traps make MCTS very inefficient.  Even if the MCTS is given 50x 
more time is needed by an exhaustive minimax tree, it could fail to find a 
level-5 or level-7 trap.

It will spend, f.i, 95% of its time searching an asymetric tree of depth > 7 
when a shallow trap of depth-7 exists, thus, missing to find the level-7 trap.

This is very hard to solve even if you have unlimited power.

 

The plain MCTS as used by AlphaZero is the most ill-suited MCTS version in my 
opinion and i have hard a hard time seeing how it can be competitive with 
Stockfish tactically.

 

My MCTS chess engine with  AlphaZero like MCTS was averaging was missing a lot 
of tactics. I don't use policy or eval networks but qsearch() for eval, and the 
policy is basically

choosing which ever moves leads to a higher eval.

 

a) My first improvement to the MCTS is to use minimax backups instead of 
averaging. This was an improvmenet but not something that would solve the traps

 

b) My second improvment is to use alphabeta rollouts. This is a rollouts 
version that can do nullmove and LMR etc... This is a huge improvment and none 
of the MCTS

versons can match it. More on alpha-beta rollouts here ( 
https://www.microsoft.com/en-us/research/wp-content/uploads/2014/11/huang_rollout.pdf
 )

 

So AlphaZero used none of the above improvements and yet it seems to be 
tactically strong. Leela-Zero suffered from tactical falls left and right too 
as I expected.

 

So the only explanation left is the policy network able to avoid traps which I 
find hard to believe it can identify more than a qsearch level tactics.

 

All I am saying is that my experience (as well as many others) with MCTS for 
tactical dominated games is bad, and there must be some breakthrough in that 
regard in AlphaZero

for it to be able to compete with Stockfish on a tactical level.

 

I am curious how Remi's attempt at Shogi using AlphaZero's method will turnout.

 

regards,

Daniel

 

 

 

 

 

 

 

 

On Tue, Mar 6, 2018 at 9:41 AM, Brian Sheppard via Computer-go 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> > wrote:

Training on Stockfish games is guaranteed to produce a blunder-fest, because 
there are no blunders in the training set and therefore the policy network 
never learns how to refute blunders.

 

This is not a flaw in MCTS, but rather in the policy network. MCTS will 
eventually search every move infinitely often, producing asymptotically optimal 
play. But if the policy network does not provide the guidance necessary to 
rapidly refute the blunders that occur in the search, then convergence of MCTS 
to optimal play will be very slow.

 

It is necessary for the network to train on self-play games using MCTS. For 
instance, the AGZ approach samples next states during training games by 
sampling from the distribution of visits in the search. Specifically: not by 
choosing the most-visited play!

 

You see how this policy trains both search and evaluation to be internally 
consistent? The policy head is trained to refute the bad moves that will come 
up in search, and the value head is trained to the value observed by the full 
tree. 

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org 
<mailto:computer-go-boun...@computer-go.org> ] On Behalf Of Dan
Sent: Monday, March 5, 2018 4:55 AM
To: computer-go@computer-go.org <mailto:computer-go@computer-go.org> 
Subject: Re: [Computer-go] 9x9 is last frontier?

 

Actually prior to this it was trained with hundreds of thousands of stockfish 
games and didn’t do well on tactics (the games were actually a blunder fest). I 
believe this is a problem of the MCTS used and not due to for lack of training. 

 

Go is a strategic game so that is different from chess that is full of traps.   
  

I m not surprised Lela zero did well in go.

 

On Mon, Mar 5, 2018 at 2:16 AM Gian-Carlo Pascutto <g...@sjeng.org 
<mailto:g...@sjeng.org> > wrote:

On 02-03-18 17:07, Dan wrote:
> Leela-chess is not performing well enough

I don't understand how one can say that given that they started with the
random network last week only and a few clients. Of course it's bad!
That doesn't say anything about the approach.

Leela Zero has gotten strong but it has been learning for *months* with
~400 people. It also took a while to get to 30 kyu.

--
GCP
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org <mailto:Computer-go@computer-go.org> 
http://computer-go.org/mailman/listinfo/computer-go


_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org <mailto:Computer-go@computer-go.org> 
http://computer-go.org/mailman/listinfo/computer-go

 

 


_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org <mailto:Computer-go@computer-go.org> 
http://computer-go.org/mailman/listinfo/computer-go

 


_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org <mailto:Computer-go@computer-go.org> 
http://computer-go.org/mailman/listinfo/computer-go

 

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to