In the MPI runs we use an 8-core node, so the playouts per node are higher.
I don't ponder, since the program isn't scaling anyway.

The number of nodes with high visits is smaller, and I only send nodes that
changed since the last send.

I do progressive unpruning, so most children have zero vists.  There are at
most 30 active children in a node.

I don't broadcast.  I use a reduce algorithm, which doesn't save a lot with
four nodes, but saves something.

The actual bandwidth is quite small.  Also the Microsoft cluster has
Infiniband.

David

> -----Original Message-----
> From: computer-go-boun...@computer-go.org [mailto:computer-go-
> boun...@computer-go.org] On Behalf Of Brian Sheppard
> Sent: Friday, October 30, 2009 11:50 AM
> To: computer-go@computer-go.org
> Subject: [computer-go] MPI vs Thread-safe
> 
> Back of envelope calculation: MFG processes 5K nodes/sec/core * 4 cores
> per
> process = 20K nodes/sec/process. Four processes makes 80K nodes/sec. If
> you
> think for 30 seconds (pondering + move time) then you are at 2.4
> million
> nodes. Figure about 25,000 nodes having 100 visits or more. UCT data is
> roughly 2500 bytes per node (2 counters of 4 bytes for 300 legal
> moves). If
> you have 4 nodes then bandwidth is 25,000 * 4 * 2500 = 250 megabytes.
> That
> is how much data the master has to broadcast for a complete refresh.
> And the
> data has to be sent to the master for aggregation, but that is a
> smaller
> number because most processes will only modify a small number of nodes
> within the refresh cycle.
> 
> Now, there are all kinds of reductions to that number. For instance,
> MPI
> supports process groups and there are network layer tricks that can (in
> theory) supply all listeners in a single message. And nodes don't have
> to be
> updated if they are not changed, and you can compress the counters, and
> so
> on.
> 
> I'm just saying that it looks like a lot of data. It can't be as much
> as I
> just calculated, because Gigabit Ethernet would saturate at something
> less
> than 100 megabytes/sec. But how much is it?
> 
> 
> >Does Mogo share RAVE values as well over MPI?
> 
> I would think so, because RAVE is critical to MoGo's move ordering
> policies.
> 
> 
> >It might be the MFGO bias.
> 
> This doesn't add up to me. If the problem were in something so basic
> then
> the serial program wouldn't play more strongly beyond 4 times as much
> clock
> time.
> 
> 
> >It might be due to a bug in my MPI implementation, or any number
> >of other possible bugs.
> 
> Bugs in non-MPI areas would also kill off improvement in the serial
> program
> beyond 4x. So if you aren't observing that then you can look elsewhere.
> 
> But of course it is possible for problems to exist only in scaling.
> 
> For example, suppose that a node is created in process A and gets 100
> trials. It would then have its UCT data passed to other processes, but
> other
> nodes have not necessarily created the node, nor done progressive
> widening
> and so on. Such a node exists in a state that could not be created in a
> serial program, which would cause a loss in move ordering. This is a
> bug in
> parallelization, though not in MPI specifically.
> 
> And you are quite right that bugs can manifest as a limit to
> scalability.
> 
> The difficulty of debugging MPI processes is perhaps the biggest
> complaint
> about that model.
> 
> 
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to