In the MPI runs we use an 8-core node, so the playouts per node are higher. I don't ponder, since the program isn't scaling anyway.
The number of nodes with high visits is smaller, and I only send nodes that changed since the last send. I do progressive unpruning, so most children have zero vists. There are at most 30 active children in a node. I don't broadcast. I use a reduce algorithm, which doesn't save a lot with four nodes, but saves something. The actual bandwidth is quite small. Also the Microsoft cluster has Infiniband. David > -----Original Message----- > From: computer-go-boun...@computer-go.org [mailto:computer-go- > boun...@computer-go.org] On Behalf Of Brian Sheppard > Sent: Friday, October 30, 2009 11:50 AM > To: computer-go@computer-go.org > Subject: [computer-go] MPI vs Thread-safe > > Back of envelope calculation: MFG processes 5K nodes/sec/core * 4 cores > per > process = 20K nodes/sec/process. Four processes makes 80K nodes/sec. If > you > think for 30 seconds (pondering + move time) then you are at 2.4 > million > nodes. Figure about 25,000 nodes having 100 visits or more. UCT data is > roughly 2500 bytes per node (2 counters of 4 bytes for 300 legal > moves). If > you have 4 nodes then bandwidth is 25,000 * 4 * 2500 = 250 megabytes. > That > is how much data the master has to broadcast for a complete refresh. > And the > data has to be sent to the master for aggregation, but that is a > smaller > number because most processes will only modify a small number of nodes > within the refresh cycle. > > Now, there are all kinds of reductions to that number. For instance, > MPI > supports process groups and there are network layer tricks that can (in > theory) supply all listeners in a single message. And nodes don't have > to be > updated if they are not changed, and you can compress the counters, and > so > on. > > I'm just saying that it looks like a lot of data. It can't be as much > as I > just calculated, because Gigabit Ethernet would saturate at something > less > than 100 megabytes/sec. But how much is it? > > > >Does Mogo share RAVE values as well over MPI? > > I would think so, because RAVE is critical to MoGo's move ordering > policies. > > > >It might be the MFGO bias. > > This doesn't add up to me. If the problem were in something so basic > then > the serial program wouldn't play more strongly beyond 4 times as much > clock > time. > > > >It might be due to a bug in my MPI implementation, or any number > >of other possible bugs. > > Bugs in non-MPI areas would also kill off improvement in the serial > program > beyond 4x. So if you aren't observing that then you can look elsewhere. > > But of course it is possible for problems to exist only in scaling. > > For example, suppose that a node is created in process A and gets 100 > trials. It would then have its UCT data passed to other processes, but > other > nodes have not necessarily created the node, nor done progressive > widening > and so on. Such a node exists in a state that could not be created in a > serial program, which would cause a loss in move ordering. This is a > bug in > parallelization, though not in MPI specifically. > > And you are quite right that bugs can manifest as a limit to > scalability. > > The difficulty of debugging MPI processes is perhaps the biggest > complaint > about that model. > > > _______________________________________________ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/