Dear Singularity & OpenMPI teams, Greg and Ralph,
going back to the Ralph Castain response to this thread:
https://groups.google.com/a/lbl.gov/forum/#!topic/singularity/lQ6sWCWhIWY
In order to get portability of Singularity images containing OpenMPI
distributed applications, he suggested mix som
Hi,
There is a known issue in ConnectX-4 which impacts RDMA_READ bandwidth with
a single QP. The overhead in the HCA of processing a single RDMA_READ
response packet is too high due to the need to lock the QP. With a small
MTU (as is the case with Ethernet packets), the impact is magnified because
There's no reason to do anything special for shared memory with a
single-process job because MPI_Win_allocate_shared(MPI_COMM_SELF) ~=
MPI_Alloc_mem(). However, it would help debugging if MPI implementers at
least had an option to take the code path that allocates shared memory even
when np=1.
Je