Gilles,

Thanks for the fast response!

The --mca btl_tcp_sndbuf 0 --mca btl_tcp_rcvbuf 0 flags you recommended
made a huge difference - this got me up to 5.7 Gb/s! I wasn't aware of
these flags... with a little Googling, is
https://www.open-mpi.org/faq/?category=tcp the best place to look for this
kind of information and any other tweaks I may want to try (or if there's a
better FAQ out there, please let me know)?

There is only eth0 on my machines so nothing to tweak there (though good to
know for the future). I also didn't see any improvement by specifying more
sockets per instance. But, your initial suggestion had a major impact.

In general I try to stay relatively up to date with my Open MPI version;
I'll be extra motivated to upgrade to 2.1.2 so that I don't have to
remember to set these --mca flags on the command line. :o)

-Adam

On Sun, Jul 9, 2017 at 9:26 AM, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

> Adam,
>
> at first, you need to change the default send and receive socket buffers :
> mpirun --mca btl_tcp_sndbuf 0 --mca btl_tcp_rcvbuf 0 ...
> /* note this will be the default from Open MPI 2.1.2 */
>
> hopefully, that will be enough to greatly improve the bandwidth for
> large messages.
>
>
> generally speaking, i recommend you use the latest (e.g. Open MPI
> 2.1.1) available version
>
> how many interfaces can be used to communicate between hosts ?
> if there is more than one (for example a slow and a fast one), you'd
> rather only use the fast one.
> for example, if eth0 is the fast interface, that can be achieved with
> mpirun --mca btl_tcp_if_include eth0 ...
>
> also, you might be able to achieve better results by using more than
> one socket on the fast interface.
> for example, if you want to use 4 sockets per interface
> mpirun --mca btl_tcp_links 4 ...
>
>
>
> Cheers,
>
> Gilles
>
> On Sun, Jul 9, 2017 at 10:10 PM, Adam Sylvester <op8...@gmail.com> wrote:
> > I am using Open MPI 2.1.0 on RHEL 7.  My application has one unavoidable
> > pinch point where a large amount of data needs to be transferred (about
> 8 GB
> > of data needs to be both sent to and received all other ranks), and I'm
> > seeing worse performance than I would expect; this step has a major
> impact
> > on my overall runtime.  In the real application, I am using
> MPI_Alltoall()
> > for this step, but for the purpose of a simple benchmark, I simplified to
> > simply do a single MPI_Send() / MPI_Recv() between two ranks of a 2 GB
> > buffer.
> >
> > I'm running this in AWS with instances that have 10 Gbps connectivity in
> the
> > same availability zone (according to tracepath, there are no hops between
> > them) and MTU set to 8801 bytes.  Doing a non-MPI benchmark of sending
> data
> > directly over TCP between these two instances, I reliably get around 4
> Gbps.
> > Between these same two instances with MPI_Send() / MPI_Recv(), I reliably
> > get around 2.4 Gbps.  This seems like a major performance degradation
> for a
> > single MPI operation.
> >
> > I compiled Open MPI 2.1.0 with gcc 4.9.1 and default settings.  I'm
> > connecting between instances via ssh and using I assume TCP for the
> actual
> > network transfer (I'm not setting any special command-line or
> programmatic
> > settings).  The actual command I'm running is:
> > mpirun -N 1 --bind-to none --hostfile hosts.txt my_app
> >
> > Any advice on other things to test or compilation and/or runtime flags to
> > set would be much appreciated!
> > -Adam
> >
> > _______________________________________________
> > users mailing list
> > users@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to