On Sat, Jan 23, 2021 at 5:34 AM Darren Cook <dar...@dcook.org> wrote:
> Each convolutional layer should spread the information across the board. > I think alpha zero used 20 layers? So even 3x3 filters would tell you > about the whole board - though the signal from the opposite corner of > the board might end up a bit weak. > > I think we can assume it is doing that successfully, because otherwise > we'd hear about it losing lots of games in ladders. > Unfortunately, we can't assume that based on that observation. If you observe what is going on with both Leela Zero and ELF, and MiniGo and SAI as well - all of which are reproductions of AlphaZero with different hyperparameters and infrastructure that do not include a ladder feature, I think you can find *all* of them have at least some trouble with ladders. So this is empirical evidence that the vanilla AlphaZero algorithm when applied to Go with a convolutional resnet, often has ladder problems. And by seeing how these reproductions behave, it also becomes clear how your observation can still be true at the same time. Which is: with enough playouts, for all these bots MCTS is able to solve ladders well enough at the root position and the upper levels of the tree to avoid losing outright - usually a few tens of thousands of playouts are plenty. So it just affects the strength by causing harm to the evaluation quality deeper in the tree in ways that are harder to see. The kind of thing that might cost you more like 20-50 Elo (pure guess, just my intuition for the *very* rough order of magnitude with this much search on top), rather than losing you every game. The bigger problem happens when you try any of these bots on weaker hardware that only gets few playouts - low-end GPUs, mobile hardware, etc. for example.... *or the numbers of playouts that people often run CGOS bots with*, namely 200 playouts, or 800 playouts, etc. You will find that they are still clearly top-pro-level or superhuman at almost all aspects of the game... except for ladders! And now at these low numbers of playouts, it does include outright losing games due to ladders, or making major misjudgments about a sequence that will depend on a ladder in 1-3 moves in the future. Sometimes, this even happens in the low-thousands of playouts. For example, attached SGF shows such a case, where Leela Zero using almost the latest 40-block network (LZ285) with 2k playouts per move (plus tree reuse) attempted to break a ladder, failed, and then played out the ladder anyways and lost on the spot. It is also true that neural nets *are* capable of learning judgments related to ladders given the right data. Some time back, I found with some visualizations for KataGo's net that it actually is tracing a width-6 diagonal band across the board from ladders! But the inductive bias is weak enough, plus the structure of the game tree for ladders is hard (it's like the classic "cliff walking" problem in RL turned up to the max), that it's a chicken-and-egg problem. Starting from a net that doesn't understand ladders yet, the "MCTS policy/value-improvement operator" is empirically very poor at bootstrapping the net into understanding them. > > something the first version of AlphaGo did (before they tried to make it > > "zero") and something that many other bots do as well. But Leela Zero and > > ELF do not do this, because of attempting to remain "zero", ... > > I know that zero-ness was very important to DeepMind, but I thought the > open source dedicated go bots that have copied it did so because AlphaGo > Zero was stronger than AlphaGo Master after 21-40 days of training. > I.e. in the rarefied atmosphere of super-human play that starter package > of human expert knowledge was considered a weight around its neck. > The PR and public press around AlphaZero may give one this impression generally - it certainly sounds like a more impressive discovery if not only can you learn from Zero, but doing so is actually better! But I'm confident that this is not true in general, and that it also depends on what "expert knowledge" you add, and how you add it. You may note that the AlphaGo Zero paper makes no mention of how long or with how many TPUs AlphaGo Master was trained (or if it does, I can't find it) - so it's hard to say what Master vs Zero shows. Also, it claims that AlphaGo Master still made use of handcrafted Monte-Carlo rollouts, which I can easily believe that jettisoning could lead to a big improvement. And it's at least plausible to me that not-pretraining on human pro games might give better final results (*but* this is unclear - at least I don't know of any paper that actually runs this as a controlled test).. But there are other bits of "expert knowledge" that do provide an improvement over being pure-zero if done correctly, including: * Predicting the final ownership of the board, not just the win/loss. * Adding a small/mild term for caring about score, rather than just win/loss. * Seeding a percentage of the self-play training games to start in positions based on external or expert-supplied game or board positions (this is the main way KataGo went from being highly vulnerable to MiYuting's flying dagger like other zero bots, to playing it decently well and now often winning games based on it depending on if the other side happens to shoot themselves in the foot with one of the trap variations or not). And yes, for now it also includes: * Adding ladder status as an input to the neural net.
0_94.sgf
Description: Binary data
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go