Back of envelope calculation: MFG processes 5K nodes/sec/core * 4 cores per process = 20K nodes/sec/process. Four processes makes 80K nodes/sec. If you think for 30 seconds (pondering + move time) then you are at 2.4 million nodes. Figure about 25,000 nodes having 100 visits or more. UCT data is roughly 2500 bytes per node (2 counters of 4 bytes for 300 legal moves). If you have 4 nodes then bandwidth is 25,000 * 4 * 2500 = 250 megabytes. That is how much data the master has to broadcast for a complete refresh. And the data has to be sent to the master for aggregation, but that is a smaller number because most processes will only modify a small number of nodes within the refresh cycle.
Now, there are all kinds of reductions to that number. For instance, MPI supports process groups and there are network layer tricks that can (in theory) supply all listeners in a single message. And nodes don't have to be updated if they are not changed, and you can compress the counters, and so on. I'm just saying that it looks like a lot of data. It can't be as much as I just calculated, because Gigabit Ethernet would saturate at something less than 100 megabytes/sec. But how much is it? >Does Mogo share RAVE values as well over MPI? I would think so, because RAVE is critical to MoGo's move ordering policies. >It might be the MFGO bias. This doesn't add up to me. If the problem were in something so basic then the serial program wouldn't play more strongly beyond 4 times as much clock time. >It might be due to a bug in my MPI implementation, or any number >of other possible bugs. Bugs in non-MPI areas would also kill off improvement in the serial program beyond 4x. So if you aren't observing that then you can look elsewhere. But of course it is possible for problems to exist only in scaling. For example, suppose that a node is created in process A and gets 100 trials. It would then have its UCT data passed to other processes, but other nodes have not necessarily created the node, nor done progressive widening and so on. Such a node exists in a state that could not be created in a serial program, which would cause a loss in move ordering. This is a bug in parallelization, though not in MPI specifically. And you are quite right that bugs can manifest as a limit to scalability. The difficulty of debugging MPI processes is perhaps the biggest complaint about that model. _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/