Patrik Jonsson wrote:
Hi all,
I'm seeing performance issues I don't understand in my multithreaded
MPI code, and I was hoping someone could shed some light on this.
The code structure is as follows: A computational domain is decomposed
into MPI tasks. Each MPI task has a "master thread" that receives
messages from the other tasks and puts those into a local, concurrent
queue. The tasks then have a few "worker threads" that processes the
incoming messages and when necessary sends them to other tasks. So for
each task, there is one thread doing receives and N (typically number
of cores-1) threads doing sends. All messages are nonblocking, so the
workers just post the sends and continue with computation, and the
master repeatedly does a number of test calls to check for incoming
messages (there are different flavors of these messages so it does
several tests).
When do you do the MPI_Test on the Isends? I have had performance issues in a
number of systems if I would use a single queue of MPI_Requests that would keep
Isends to different ranks and testing them one by one. It appears that some
messages are sent out more efficiently if you test them.
I found that either using MPI_Testsome or having a map(key=rank, value=queue of
MPI_Requests) and testing for each key the first MPI_Request, resolved this issue.