Dear all,
we do have a problem using one-sided mpi communication with OpenMPI.
The scenario is the following. We do have a computational model using
exclusively p2p mpi commuication calls. This runs fine and fast on an
rather new cluster with FDR IB and Intel SandyBridge XEONS. We have
around 15 years of experience in parallelization around and get very
nice scaling.
Now we have started building I/O servers which are getting the data from
buffers on the compute PEs via one-sided MPI_Get. and write out
to a parallel filesystem. This should ensure that we can cleanly overlay
computation and I/O.
The 'algorithm' does work very nicely with mvapich2. Hardly any
influence is seen in the compute PEs communication. The I/O is
nicely writing out on the described RDMA approach.
Using this with OpenMPI-1.6.5 with IB leads to strange behaviour:
The first step in copying the data to buffer and preparing the windows
are as fast as with mvapich, but the communication of the compute PEs
start to slow down and it seems subsequently the MPI_Get. The slowing
down is growing over the first couple of I/O steps and finally the whole
is much slower than the version with I/O via PE 0 at the end.
My question would be to all openmpi power users and developers, what
would be required to get this properly running.
In case it is required to give more information, please come back to me.
Maybe the explanation what we do is insufficient.
Best regards,
Luis