On Fri, Dec 17, 2010 at 5:43 PM, Sashi Balasingam <sashiba...@yahoo.com>wrote:
> Hi, > I recently started on an MPI-based, 'real-time', pipelined-processing > application, and the application fails due to large time-jitter in sending > and receiving messages. Here are related info - > > 1) Platform: > a) Intel Box: Two Hex-core, Intel Xeon, 2.668 GHz (...total of 12 cores), > b) OS: SUSE Linux Enterprise Server 11 (x86_64) - Kernel \r (\l) > c) MPI Rev: (OpenRTE) 1.4, (...Installed OFED package) > d) HCA: InfiniBand: Mellanox Technologies MT26428 [ConnectX IB QDR, PCIe > 2.0 5GT/s] (rev a0) > > 2) Application detail > > a) Launching 7 processes, for pipelined processing, where each process > waits for a message (sizes vary between 1 KBytes to 26 KBytes), > then process the data, and outputs a message (sizes vary between 1 KBytes > to 26 KBytes), to next process. > > b) MPI transport functions used : "MPI_Isend", MPI_Irecv, MPI_Test. > i) For Receiving messages, I first make an MPI_Irecv call, followed by a > busy-loop on MPI_Test, waiting for message > ii) For Sending message, there is a busy-loop on MPI_Test to ensure > prior buffer was sent, then use MPI_Isend. > > c) When the job starts, all these 7 process are put in High priority mode ( > SCHED_FIFO policy, with priority setting of 99). > The Job entails an input data packet stream (and a series of MPI messages), > continually at 40 micro-sec rate, for a few minutes. > > 3) The Problem: > Most calls to MPI_Test (...which is non-blocking) takes a few micro-sec, > but around 10% of the job, it has a large jitter, that vary from 1 to 100 > odd millisec. This causes > some of the application input queues to fill-up and cause a failure. > > Any suggestions to look at on the MPI settings or OS config/issues will be > much appreciated. > > Thanks in advance. > Sanji > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > I had a similar issue, a work-around is to avoid polling too much by placing some kind of a timer in your code before the MPI_Test call.