Hmmm...you probably can't without digging down into the diagnostics. Perhaps we could help more if we had some idea how you are measuring this "latency". I ask because that is orders of magnitude worse than anything we measure - so I suspect the problem is in your app (i.e., that the time you are measuring is actually how long it takes you to get around to processing a message that was received some time ago).
On Oct 3, 2012, at 11:52 AM, "Hodge, Gary C" <gary.c.ho...@lmco.com> wrote: > how do I tell the difference between when the message was received and when > the message was picked up in MPI_Test? > > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On > Behalf Of Ralph Castain > Sent: Wednesday, October 03, 2012 1:00 PM > To: Open MPI Users > Subject: EXTERNAL: Re: [OMPI users] unacceptable latency in gathering process > > Out of curiosity, have you logged the time when the SP called "send" and > compared it to the time when the message was received, and when that message > is picked up in MPI_Test? In other words, have you actually verified that the > delay is in the MPI library as opposed to in your application? > > > On Oct 3, 2012, at 9:40 AM, "Hodge, Gary C" <gary.c.ho...@lmco.com> wrote: > > > Hi all, > I am running on an IBM BladeCenter, using Open MPI 1.4.1, and opensm subnet > manager for Infiniband > > Our application has real time requirements and it has recently been proven > that it does not scale to meet future requirements. > Presently, I am re-organizing the application to process work in a more > parallel manner then it does now. > > Jobs arrive at the rate of 200 per second and are sub-divided into groups of > objects by a master process (MP) on its own node. > The MP then assigns the object groups to 20 slave processes (SP), each > running on their own node, to do the expensive computational work in parallel. > The SPs then send their results to a gatherer process (GP) on its own node > that merges the results for the job and sends it onward for final processing. > The highest latency for the last 1024 jobs that were processed is then > written to a log file that is displayed by a GUI. > Each process uses the same controller method for sending and receiving > messages as follows: > > For (each CPU that sends us input) > { > MPI_Irecv(….) > } > > While (true) > { > For (each CPU that sends us input) > { > MPI_Test(….) > If (message was received) > { > Copy the message > Queue the copy to our input queue > MPI_Irecv(…) > } > } > If (there are messages on our input queue) > { > … process the FIRST message on queue (this may queue messages > for output) …. > > For (each message on our output queue) > { > MPI_Send(…) > } > } > } > > My problem is that I do not meet our applications performance requirements > for a job (~ 20 ms) until I reduce the number of SPs from 20 to 4 or less. > I added some debug into the GP and found that there are never more than 14 > messages received in the for loop that calls MPI_Test. > The messages that were sent from the other 6 SPs will eventually arrive at > the GP in a long stream after experiencing high latency (over 600 ms). > > Going forward, we need to handle more objects per job and will need to have > more than 4 SPs to keep up. > My thought is that I have to obey this 4 SPs to 1 GP ratio and create > intermediate GPs to gather results from every 4 slaves. > > Is this a contention problem at the GP? > Is there debugging or logging I can turn on in the MPI to prove that > contention is occurring? > Can I configure MPI receive processing to improve upon the 4 to 1 ratio? > Can I improve the controller method (listed above) to gain a performance > improvement? > > Thanks for any suggestions. > Gary Hodge > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users