Back of envelope calculation: MFG processes 5K nodes/sec/core * 4 cores per
process = 20K nodes/sec/process. Four processes makes 80K nodes/sec. If you
think for 30 seconds (pondering + move time) then you are at 2.4 million
nodes. Figure about 25,000 nodes having 100 visits or more. UCT data is
roughly 2500 bytes per node (2 counters of 4 bytes for 300 legal moves). If
you have 4 nodes then bandwidth is 25,000 * 4 * 2500 = 250 megabytes. That
is how much data the master has to broadcast for a complete refresh. And the
data has to be sent to the master for aggregation, but that is a smaller
number because most processes will only modify a small number of nodes
within the refresh cycle.

Now, there are all kinds of reductions to that number. For instance, MPI
supports process groups and there are network layer tricks that can (in
theory) supply all listeners in a single message. And nodes don't have to be
updated if they are not changed, and you can compress the counters, and so
on.

I'm just saying that it looks like a lot of data. It can't be as much as I
just calculated, because Gigabit Ethernet would saturate at something less
than 100 megabytes/sec. But how much is it?


>Does Mogo share RAVE values as well over MPI?

I would think so, because RAVE is critical to MoGo's move ordering policies.


>It might be the MFGO bias.

This doesn't add up to me. If the problem were in something so basic then
the serial program wouldn't play more strongly beyond 4 times as much clock
time.


>It might be due to a bug in my MPI implementation, or any number
>of other possible bugs.

Bugs in non-MPI areas would also kill off improvement in the serial program
beyond 4x. So if you aren't observing that then you can look elsewhere.

But of course it is possible for problems to exist only in scaling.

For example, suppose that a node is created in process A and gets 100
trials. It would then have its UCT data passed to other processes, but other
nodes have not necessarily created the node, nor done progressive widening
and so on. Such a node exists in a state that could not be created in a
serial program, which would cause a loss in move ordering. This is a bug in
parallelization, though not in MPI specifically.

And you are quite right that bugs can manifest as a limit to scalability.

The difficulty of debugging MPI processes is perhaps the biggest complaint
about that model.


_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to