Steve, Some of these interferences are due to the design choices in Open MPI, but others are due to the constraints imposed by MPI. As an example, MPI requires FIFO ordering on message delivery on each communication channel (communicator/peer). If you inject messages from multiple threads for the same destination and on the same communicator, internally we are either forced to take a lock very early in the send function (ensuring the messages are pushed into the network in order), or expect the receiver do the right thing (deliver the messages in the right order). Our design choice being the second, the received has to reorder the messages, and this involves at least one extra memcpy (upper bound being the eager message size).
George. On Mon, Aug 29, 2016 at 1:39 PM, Stephen Ibanez <stephen.iba...@oracle.com> wrote: > Hi All, > > I am trying to use the MPI_THREAD_MULTIPLE support for OpenMPI 2.0. I know > the documentation states that multi threaded support for OpenMPI is only > lightly tested and likely will result in poor performance. I have noticed > that when I have many threads for a particular process that are all calling > MPI_Recv then the receive calls appear to interfere with each other and the > overall performance is worse than a single thread. > > To clarify, the experiment that I am running consists of process 1 on a > node in an infiniband cluster generating requests which are sent to process > 2 on a different node in an infiniband cluster. Process 2 simply receives > the request, does a little bit of processing, and replies back to process > 1. I noticed that as I add more threads running on different cores to > process 2, the total number of requests/sec completed decreases. I wouldn't > expect that adding more threads would decrease throughput, unless the > receive calls from the different threads are interfering with each other. > > After looking a little bit into the OpenMPI implementation of the MPI_Recv > function, I noticed that it looks like each call to MPI_Recv requires each > thread to obtain a mutex so that only one thread can receive at a time. I > thought that this would explain the decrease in performance caused by many > threads calling MPI_Recv. To try and get around this issue, I tried to wrap > the MPI_Recv call in a pthread_spinlock_t . The idea here being that the > threads would try to lock the spinlock rather than the mutex, which should > eliminate most of the contention between threads. However, I am still > seeing that increasing the number of threads for process 2 causes the > throughput to decrease. > > So my question is, are there any other sources of interference between > threads in OpenMPI that would cause the number of requests completed/sec to > decrease as I increase the number of threads in process 2? > > Thanks, > -Steve > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users >
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users