Re: [Computer-go] Training an AlphaGo Zero-like algorithm with limited hardware on 7x7 boards

Igor Polyakov Sun, 26 Jan 2020 20:18:25 -0800

I would be surprised if my model ever lost to GNU Go on 9x9. It's a lot
stronger than Fuego, which already stomps GNU Go. It would be a waste of
time to test it vs. GNU Go or even MCTS bots. I only plan on running tests
vs. current best models to see how it does against the state of the art 9x9
nets


On Mon, Jan 27, 2020, 06:39 cody2007 via Computer-go <
computer-go@computer-go.org> wrote:

> Thanks again for your thoughts and experiences Rémi and Igor.
>
> I'm still puzzled by what is making training slower for me than Rémi
> (although I wouldn't be surprised if Igor's results were faster when
> matched for hardware, model size, strength etc-- see below). Certainly komi
> sounds like it might help a lot. I'm going to have to check out the code
> from David Wu.
>
> It takes me longer than a day for "training" to actually start with my
> code -- because I first generate 128*2*32*35 = 285k training samples before
> even running the first round of backprop. After the first day, therefore,
> my model is always still entirely random.  So, possibly:
>
> (1) either your and David Wu's implementations are faster in wall clock
> time computationally
> (2) backprop is being started before the initial training buffer is filled
> (the Wu paper used 250k but it's not 100% clear to me if training did not
> start until that initial buffer was filled)
> (3) "training" time is being counted as the time when backprop starts
> regardless of how long the initial training buffer took to create.
>
> Another thing is that I'm not using any of the techniques beyond AlphaGo
> Zero that David Wu used. So, depending on if you guys are using some or all
> of those additional features and/or loss functions, it'd be expected that
> you're getting much faster training than me. I was actually starting to
> test adding some of his ideas from that paper to my code a while back but
> then coincidentally discovered the models I was training weren't as
> horrible as I had first thought.
>
> Have either of you ever benchmarked your 7x7 (or 9x9) models against GNU
> Go?
>
> By the way, all benchmarking against GNU Go that I've reported was in
> single-pass mode only (i.e., I was not running the tree search on top of
> the net outputs)
>
> Thanks,
> Cody
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Sunday, January 26, 2020 11:22 AM, Igor Polyakov <
> weiqiprogramm...@gmail.com> wrote:
>
> I trained using David Wu's code for a few months on 9x9 only and it's been
> superhuman after a few months.
>
> I'm not sure if anyone's interested, but I can release my network to the
> world. It's around the strength of KataGo, but only on 9x9. I could do a
> final test before releasing it into the wild
>
> On Mon, Jan 27, 2020, 00:17 Rémi Coulom <remi.cou...@gmail.com> wrote:
>
>> Yes, using komi would help a lot. Still, I feel that something else must
>> be wrong, because winning 100% of the games as Black without komi should be
>> very easy on 7x7.
>>
>> I have not written anything about what I did with Crazy Stone. But my
>> experiments and ideas were really very similar to what David Wu did:
>> https://blog.janestreet.com/accelerating-self-play-learning-in-go/
>>
>> To clarify what I wrote in my previous message: "strong from scratch in a
>> single day" was for 7x7. I like testing new ideas with small networks on
>> small boards, because training is very fast, and what works on small boards
>> with small networks usually also works on large boards with big networks.
>>
>> Rémi
>>
>> On Sun, Jan 26, 2020 at 12:30 AM cody2007 <cody2...@protonmail.com>
>> wrote:
>>
>>> Hi Rémi,
>>>
>>> Thanks for your comments! I am not using any komi and had not given much
>>> thought to it. Although, I suppose by having black win most games, I'm
>>> depriving the network of its only learning signal. I will have to try with
>>> an appropriately set komi next...
>>>
>>> >When I started to develop the Zero version of Crazy Stone, I spend a
>>> lot of time optimizing my method on a single (V100) GPU
>>> Any chance you've written about it somewhere? I'd be interested to learn
>>> more but wasn't able to find anything on the Crazy Stone website.
>>>
>>> Thanks,
>>> Cody
>>>
>>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>> On Saturday, January 25, 2020 5:49 PM, Rémi Coulom <
>>> remi.cou...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> Thanks for sharing your experiments.
>>>
>>> Your match results are strange. Did you use a komi? You should use a
>>> komi of 9:
>>> https://senseis.xmp.net/?7x7
>>>
>>> The final strength of your network looks surprisingly weak. When I
>>> started to develop the Zero version of Crazy Stone, I spend a lot of time
>>> optimizing my method on a single (V100) GPU. I could train a strong network
>>> from scratch in a single day. Using a wrong komi might have hurt you. Also,
>>> on such a small board, it is not so easy to make sure that the self-play
>>> games have enough variety. You'd have to find many balanced random initial
>>> positions in order to avoid replicating the same game again and again.
>>>
>>> Rémi
>>>
>>>
>>> _______________________________________________
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>>
>
> _______________________________________________
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Training an AlphaGo Zero-like algorithm with limited hardware on 7x7 boards

Reply via email to