I used stochastic sampling at internal nodes, because of this. > During the forward simulation phase of SEARCH, the action at each node x is > selected by sampling a ∼ π¯(·|x). > As a result, the full imaginary trajectory is generated consistently > according to policy π¯.
> In this section, we establish our main claim namely that AlphaZero’s action > selection criteria can be interpreted as approximating the solution to a > regularized policy-optimization objective. I think they say UCT and PUCT is approximation of direct π¯ sampling, but I haven't understood section 3 well. 2020年7月20日(月) 2:51 Daniel <dsha...@gmail.com>: > > @Kensuke I suppose all the proposed algorithms ACT, SEARCH and LEARN are > meant to be used during training, no? > I think I understand ACT and LEARN but I am not sure about SEARCH for which > they say this: > > > During search, we propose to stochastically sample actions according to π¯ > > instead of the deterministic action selection rule of Eq. 1. > > This sounds much like the random selection done at the root with temperature, > but this time applied at internal nodes. > Does it mean the pUCT formula is not used? Why does the selection have to be > stochastic now? > On selection, you compute π_bar every time from (q, π_theta, n_visits) so I > suppose π_bar has everything it needs to balance exploration and exploitation. > > > On Sun, Jul 19, 2020 at 8:10 AM David Wu <lightvec...@gmail.com> wrote: >> >> I imagine that at low visits at least, "ACT" behaves similarly to Leela >> Zero's "LCB" move selection, which also has the effect of sometimes >> selecting a move that is not the max-visits move, if its value estimate has >> recently been found to be sufficiently larger to balance the fact that it is >> lower prior and lower visits (at least, typically, this is why the move >> wouldn't have been the max visits move in the first place). It also scales >> in an interesting way with empirical observed playout-by-playout variance of >> moves, but I think by far the important part is that it can use sufficiently >> confident high value to override max-visits. >> >> The gain from "LCB" in match play I recall is on the very very rough order >> of 100 Elo, although it could be less or more depending on match conditions >> and what neural net is used and other things. So for LZ at least, "ACT"-like >> behavior at low visits is not new. >> >> >> On Sun, Jul 19, 2020 at 5:39 AM Kensuke Matsuzaki <knsk.m...@gmail.com> >> wrote: >>> >>> Hi, >>> >>> I couldn't improve leela zero's strength by implementing SEARCH and ACT. >>> https://github.com/zakki/leela-zero/commits/regularized_policy >>> >>> 2020年7月17日(金) 2:47 Rémi Coulom <remi.cou...@gmail.com>: >>> > >>> > This looks very interesting. >>> > >>> > From a quick glance, it seems the improvement is mainly when the number >>> > of playouts is small. Also they don't test on the game of Go. Has anybody >>> > tried it? >>> > >>> > I will take a deeper look later. >>> > >>> > On Thu, Jul 16, 2020 at 9:49 AM Ray Tayek <rta...@ca.rr.com> wrote: >>> >> >>> >> https://old.reddit.com/r/MachineLearning/comments/hrzooh/r_montecarlo_tree_search_as_regularized_policy/ >>> >> >>> >> >>> >> -- >>> >> Honesty is a very expensive gift. So, don't expect it from cheap people >>> >> - Warren Buffett >>> >> http://tayek.com/ >>> >> >>> >> _______________________________________________ >>> >> Computer-go mailing list >>> >> Computer-go@computer-go.org >>> >> http://computer-go.org/mailman/listinfo/computer-go >>> > >>> > _______________________________________________ >>> > Computer-go mailing list >>> > Computer-go@computer-go.org >>> > http://computer-go.org/mailman/listinfo/computer-go >>> >>> >>> >>> -- >>> Kensuke Matsuzaki >>> _______________________________________________ >>> Computer-go mailing list >>> Computer-go@computer-go.org >>> http://computer-go.org/mailman/listinfo/computer-go >> >> _______________________________________________ >> Computer-go mailing list >> Computer-go@computer-go.org >> http://computer-go.org/mailman/listinfo/computer-go > > _______________________________________________ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go -- Kensuke Matsuzaki _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go