Hi, Ross, Just out of curiosity, is Rmpi required for some package that you're using? I only ask because, if you're mostly writing your own MPI calls, you might want to look at pbdR/pbdMPI, if you haven't already. They also have a pbdPROF for profiling and which should be able to do some profiling with MPI.
http://rbigdata.github.io/packages.html I wasn't sure whether this was really on topic for the list, so I send it privately. Sorry for the extra noise if you've already eliminated pdbR as a possibility. -- bennet On Sat, Mar 22, 2014 at 3:46 PM, Ross Boylan <r...@biostat.ucsf.edu> wrote: > I have a bunch of simulators communicating results to a single > assembler. The results seem to take a long time to be received, and the > delay increases as the system runs. Here are some results: > > sent received delay > 70.679 94.776 24.097 > 94.677 144.906 50.229 > 122.082 238.713 116.631 > 144.785 313.101 168.316 > 167.918 350.037 182.119 > 190.709 384.342 193.633 > Times are wall clock times in seconds since process launch, and so there > may be some slew between sender and receiver, but it will be consistent > (this tracks only sends from one simulator and also ignores later sends > that never arrived--my completion logic needs work). > > The results are typically 500kB. Sending is via Isend (non-blocking) > and receiving via Recv (blocking). The simulators spend most of their > time computing; in particular there may be significant delays, e.g., > from 10 seconds to a minute, between calls to mpi (typically a mix of > Isend, Recv, and Testsome). All processes are on the same machine (for > now). > > The interval between assembler receives (from all sources) is sometimes > quite brief, under 2 seconds, and the time between receives is quite > variable. Neither is consistent with the theory that the receiver is > saturated receiving messages, each of which takes a long time to > transmit (I mean the active part of the transmission, when bits are > flowing). I infer from this that actually transmitting the message does > not take long, and that the longer gaps between receives have some other > cause. > > This is all from R, and the problem might lie with higher level code. > > Can anyone explain what is going on, and what I might do to alleviate > it? > > My speculation is that the necessary handshaking can only take place > while both processes have called MPI, or perhaps some particular calls > are required. The assembler spends most of its time executing a > receive, but the simulators are mostly busy with other stuff. And so I > suspect the delay is with the simulators, though I'm not sure what to do > about it. I could wait on completion from the sender, but that kind of > defeats the purpose of doing an isend. > > In a related thread about a similar issue, Jeff Squyres wrote > (http://www.open-mpi.org/community/lists/users/2011/07/16928.php) > ---------------------------------------------------- > If so, it's because Open MPI does not do background progress on > non-blocking sends in all cases. Specifically, if you're sending over > TCP and the message is "long", the OMPI layer in the master doesn't > actually send the whole message immediately because it doesn't want to > unexpectedly consume a lot of resources in the slave. So the master > only sends a small fragment of the message and the communicator,tag > tuple suitable for matching at the receiver. When the receiver posts a > corresponding MPI_Recv (time=C), it sends back an ACK to the master, > enabling the master to send the rest of the message. > > However, since OMPI doesn't support background progress in all > situations, the master doesn't see this ACK until it goes into the MPI > progression engine -- i.e., when you call MPI_Recv() at Time=E. Then > the OMPI layer in the master sees the ACK and sends the rest of the > message. > ---------------------------------------------------------------- > > I'm not sending over tcp (yet) but maybe I'm running into something > similar. > > I had thought the MPI stuff was handled in separate layer or thread that > would magically do all the work of moving messages around; the fact that > top shows all the CPU going to the R processes suggests that's not the > case. > > Running OMPI 1.7.4. > > Thanks for any help. > Ross Boylan > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users