Adam,

at first, you need to change the default send and receive socket buffers :
mpirun --mca btl_tcp_sndbuf 0 --mca btl_tcp_rcvbuf 0 ...
/* note this will be the default from Open MPI 2.1.2 */

hopefully, that will be enough to greatly improve the bandwidth for
large messages.


generally speaking, i recommend you use the latest (e.g. Open MPI
2.1.1) available version

how many interfaces can be used to communicate between hosts ?
if there is more than one (for example a slow and a fast one), you'd
rather only use the fast one.
for example, if eth0 is the fast interface, that can be achieved with
mpirun --mca btl_tcp_if_include eth0 ...

also, you might be able to achieve better results by using more than
one socket on the fast interface.
for example, if you want to use 4 sockets per interface
mpirun --mca btl_tcp_links 4 ...



Cheers,

Gilles

On Sun, Jul 9, 2017 at 10:10 PM, Adam Sylvester <op8...@gmail.com> wrote:
> I am using Open MPI 2.1.0 on RHEL 7.  My application has one unavoidable
> pinch point where a large amount of data needs to be transferred (about 8 GB
> of data needs to be both sent to and received all other ranks), and I'm
> seeing worse performance than I would expect; this step has a major impact
> on my overall runtime.  In the real application, I am using MPI_Alltoall()
> for this step, but for the purpose of a simple benchmark, I simplified to
> simply do a single MPI_Send() / MPI_Recv() between two ranks of a 2 GB
> buffer.
>
> I'm running this in AWS with instances that have 10 Gbps connectivity in the
> same availability zone (according to tracepath, there are no hops between
> them) and MTU set to 8801 bytes.  Doing a non-MPI benchmark of sending data
> directly over TCP between these two instances, I reliably get around 4 Gbps.
> Between these same two instances with MPI_Send() / MPI_Recv(), I reliably
> get around 2.4 Gbps.  This seems like a major performance degradation for a
> single MPI operation.
>
> I compiled Open MPI 2.1.0 with gcc 4.9.1 and default settings.  I'm
> connecting between instances via ssh and using I assume TCP for the actual
> network transfer (I'm not setting any special command-line or programmatic
> settings).  The actual command I'm running is:
> mpirun -N 1 --bind-to none --hostfile hosts.txt my_app
>
> Any advice on other things to test or compilation and/or runtime flags to
> set would be much appreciated!
> -Adam
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to