Re: [Computer-go] Training an AlphaGo Zero-like algorithm with limited hardware on 7x7 boards

cody2007 via Computer-go Sun, 26 Jan 2020 14:39:19 -0800

Thanks again for your thoughts and experiences Rémi and Igor.

I'm still puzzled by what is making training slower for me than Rémi (although 
I wouldn't be surprised if Igor's results were faster when matched for 
hardware, model size, strength etc-- see below). Certainly komi sounds like it 
might help a lot. I'm going to have to check out the code from David Wu.


It takes me longer than a day for "training" to actually start with my code -- 
because I first generate 128*2*32*35 = 285k training samples before even 
running the first round of backprop. After the first day, therefore, my model 
is always still entirely random.  So, possibly:

(1) either your and David Wu's implementations are faster in wall clock time 
computationally
(2) backprop is being started before the initial training buffer is filled (the 
Wu paper used 250k but it's not 100% clear to me if training did not start 
until that initial buffer was filled)
(3) "training" time is being counted as the time when backprop starts 
regardless of how long the initial training buffer took to create.

Another thing is that I'm not using any of the techniques beyond AlphaGo Zero 
that David Wu used. So, depending on if you guys are using some or all of those 
additional features and/or loss functions, it'd be expected that you're getting 
much faster training than me. I was actually starting to test adding some of 
his ideas from that paper to my code a while back but then coincidentally 
discovered the models I was training weren't as horrible as I had first thought.

Have either of you ever benchmarked your 7x7 (or 9x9) models against GNU Go?

By the way, all benchmarking against GNU Go that I've reported was in 
single-pass mode only (i.e., I was not running the tree search on top of the 
net outputs)

Thanks,
Cody

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Sunday, January 26, 2020 11:22 AM, Igor Polyakov 
<weiqiprogramm...@gmail.com> wrote:

> I trained using David Wu's code for a few months on 9x9 only and it's been 
> superhuman after a few months.
>
> I'm not sure if anyone's interested, but I can release my network to the 
> world. It's around the strength of KataGo, but only on 9x9. I could do a 
> final test before releasing it into the wild
>
> On Mon, Jan 27, 2020, 00:17 Rémi Coulom <remi.cou...@gmail.com> wrote:
>
>> Yes, using komi would help a lot. Still, I feel that something else must be 
>> wrong, because winning 100% of the games as Black without komi should be 
>> very easy on 7x7.
>>
>> I have not written anything about what I did with Crazy Stone. But my 
>> experiments and ideas were really very similar to what David Wu did:
>> https://blog.janestreet.com/accelerating-self-play-learning-in-go/
>>
>> To clarify what I wrote in my previous message: "strong from scratch in a 
>> single day" was for 7x7. I like testing new ideas with small networks on 
>> small boards, because training is very fast, and what works on small boards 
>> with small networks usually also works on large boards with big networks.
>>
>> Rémi
>>
>> On Sun, Jan 26, 2020 at 12:30 AM cody2007 <cody2...@protonmail.com> wrote:
>>
>>> Hi Rémi,
>>>
>>> Thanks for your comments! I am not using any komi and had not given much 
>>> thought to it. Although, I suppose by having black win most games, I'm 
>>> depriving the network of its only learning signal. I will have to try with 
>>> an appropriately set komi next...
>>>
>>>>When I started to develop the Zero version of Crazy Stone, I spend a lot of 
>>>>time optimizing my method on a single (V100) GPU
>>> Any chance you've written about it somewhere? I'd be interested to learn 
>>> more but wasn't able to find anything on the Crazy Stone website.
>>>
>>> Thanks,
>>> Cody
>>>
>>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>> On Saturday, January 25, 2020 5:49 PM, Rémi Coulom <remi.cou...@gmail.com> 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Thanks for sharing your experiments.
>>>>
>>>> Your match results are strange. Did you use a komi? You should use a komi 
>>>> of 9:
>>>> https://senseis.xmp.net/?7x7
>>>>
>>>> The final strength of your network looks surprisingly weak. When I started 
>>>> to develop the Zero version of Crazy Stone, I spend a lot of time 
>>>> optimizing my method on a single (V100) GPU. I could train a strong 
>>>> network from scratch in a single day. Using a wrong komi might have hurt 
>>>> you. Also, on such a small board, it is not so easy to make sure that the 
>>>> self-play games have enough variety. You'd have to find many balanced 
>>>> random initial positions in order to avoid replicating the same game again 
>>>> and again.
>>>>
>>>> Rémi
>>
>> _______________________________________________
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Training an AlphaGo Zero-like algorithm with limited hardware on 7x7 boards

Reply via email to