30M samples, 42 planes, 19x19 chars/plane, plus database overhead is 490 GB.  
In a dual boot machine that had windows on it originally, Windows wants to keep 
half of its original partition.  I didn’t want to reinstall windows after 
formatting, so I have a 1 TB for Linux.

However, AlphaGo used data augmentation (rotations and reflections), which will 
increase the input size to about 4 TB.  The input bandwidth is pretty low, and 
an external 8 TB USB drive will hold it all (about $250).  I’d rather just buy 
another drive than spend time coding and debugging another Caffe input layer to 
further compress the inputs.






From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Álvaro Begué
Sent: Wednesday, April 27, 2016 1:56 AM
To: computer-go
Subject: Re: [Computer-go] Machine for Deep Neural Net training


What are you doing that uses so much disk space? An extremely naive computation 
of required space for what you are doing is:

30M samples * (42 input planes + 1 output plane)/sample * 19*19 floats/plane * 
4 bytes/float = 1.7 TB


So that's cutting it close, But I think the inputs and outputs are all binary, 
which allows a factor of 32 compression right there, and you might be using 
constant planes for some inputs, and if the output is a move it fits in 9 





On Wed, Apr 27, 2016 at 12:55 AM, David Fotland <fotl...@smart-games.com> wrote:

I have my deep neural net training setup working, and it's working so well I
want to share.  I already had Caffe running on my desktop machine (4 core
i7) without a GPU, with inputs similar to AlphaGo generated by Many Faces
into an LMDB database.  I trained a few small nets for a day each to get
some feel for it.

I bought an Alienware Area 51 from Dell, with two GTX 980 TI GPUs, 16 GB of
memory, and 2 TB of disk.  I set it up to dual boot Ubuntu 14.04, which made
it trivial to get the latest caffe up and running with CUDNN.  2 TB of disk
is not enough.  I'll have to add another drive.

I expected something like 20x speedup on training, but I was shocked by what
I actually got.

On my desktop, the Caffe MNIST sample took 27 minutes to complete.  On the
new machine it was 22 seconds.  73x faster.

My simple network has 42 input planes, and 4 layers of 48 filters each.
Training runs about 100x faster on the Alienware.  Training 100k Caffe
iterations (batches) of 50 positions takes 13 minutes, rather than almost a
full day on my desktop.


Computer-go mailing list


Computer-go mailing list

Reply via email to