Hello, everyone
I am struggling a bit with IB performance when sending data from a POSIX
shared memory region (/dev/shm). The memory is shared among many MPI
processes within the same compute node. Essentially, I see a bit hectic
performance, but it seems that my code it is roughly twice slower than
when using a usual, malloced send buffer.
I was wondering - has any of you had experience with sending SHM over
Infiniband? why would I see so much worse results? Is it e.g., because
this memory cannot be pinned and OpenMPI is reallocating it? Or is it
some OS peculiarity?
I would appreciate any hints at all. Thanks a lot !
Marcin