Alpha-beta rollouts is like MCTS without playouts (as in AlphaZero), and
something that can also do alpha-beta pruning.
With standard MCTS, the tree converges to a minmax tree not an alpha-beta
tree, so as you know there is a huge branching factor difference there.
For MCTS to become competitive
Summarizing the objections to my (non-evidence-based, but hand-wavy
observationally-based) assertion that 9x9 is going down anytime someone
really wants it to go down, I get the following:
* value networks can't hack it (okay, maybe? does this make it less likely?
-- we shouldn't expect to cut-and
Sorry, I haven't been paying enough attention lately to know what
"alpha-beta rollouts" means precisely. Can you either describe them or give
me a reference?
Thanks,
Álvaro.
On Tue, Mar 6, 2018 at 1:49 PM, Dan wrote:
> I did a quick test with my MCTS chess engine wth two different
> implement
Well, AlphaZero did fine at chess tactics, and the papers are clear on the
details. There must be an error in your deductions somewhere.
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of Dan
Sent: Tuesday, March 6, 2018 1:46 PM
To: computer-go@computer-go.org
Subject:
Hi Remi, hi friends,
> For the moment, my main objective is shogi. I
> will participate in the World Computer Shogi
> Championship in May.
Good luck! Please, keep us informed when
the tournament is running.
> So I am developing a game-independent AlphaZero framework.
I am hoping several pe
I did a quick test with my MCTS chess engine wth two different
implementations.
A standard MCTS with averaging, and MCTS with alpha-beta rollouts. The
result is like a 600 elo difference
Finished game 44 (scorpio-pmcts vs scorpio-mcts): 1/2-1/2 {Draw by 3-fold
repetition}
Score of scorpio-mcts vs
I am pretty sure it is an MCTS problem and I suspect not something that
could be easily solved with a policy network (could be wrong hree). My
opinon is that DCNN is not
a miracle worker (as somebody already mentioned here) and it is going to
fail resolving tactics. I would be more than happy wit
valky...@phmp.se: <19f31e7e5cdf310b9afa91f577997...@phmp.se>:
>I think you misunderstood what I wrote,
>if perfect play on 9x9 is 6000 Elo, then if the value function is 3000
>Elo and MC eval is 2000 Elo with 1 second thinking time then it might
>be that the combination of a value function and M
Training on Stockfish games is guaranteed to produce a blunder-fest, because
there are no blunders in the training set and therefore the policy network
never learns how to refute blunders.
This is not a flaw in MCTS, but rather in the policy network. MCTS will
eventually search every move in
I think you misunderstood what I wrote,
if perfect play on 9x9 is 6000 Elo, then if the value function is 3000
Elo and MC eval is 2000 Elo with 1 second thinking time then it might
be that the combination of a value function and MC eval ends up being
2700 Elo. It could also be that it ends up
10 matches
Mail list logo