Adam,

Thanks for letting us know your performance issue has been resolved.


yes, https://www.open-mpi.org/faq/?category=tcp is the best place to look for this kind of information.

i will add a reference to these parameters. i will also ask folks at AWS if they have additional/other recommendations.


note you have a few options before 2.1.2 (or 3.0.0) is released :


- update your system wide config file (/.../etc/openmpi-mca-params.conf) or user config file

  ($HOME/.openmpi/mca-params.conf) and add the following lines

btl_tcp_sndbuf = 0

btl_tcp_rcvbuf = 0


- add the following environment variable to your environment

export OMPI_MCA_btl_tcp_sndbuf=0

export OMPI_MCA_btl_tcp_rcvbuf=0


- use Open MPI 2.0.3


- last but not least, you can manually download and apply the patch available at

https://github.com/open-mpi/ompi/commit/b64fedf4f652cadc9bfc7c4693f9c1ef01dfb69f.patch


Cheers,

Gilles

On 7/9/2017 11:04 PM, Adam Sylvester wrote:
Gilles,

Thanks for the fast response!

The --mca btl_tcp_sndbuf 0 --mca btl_tcp_rcvbuf 0 flags you recommended made a huge difference - this got me up to 5.7 Gb/s! I wasn't aware of these flags... with a little Googling, is https://www.open-mpi.org/faq/?category=tcp the best place to look for this kind of information and any other tweaks I may want to try (or if there's a better FAQ out there, please let me know)? There is only eth0 on my machines so nothing to tweak there (though good to know for the future). I also didn't see any improvement by specifying more sockets per instance. But, your initial suggestion had a major impact. In general I try to stay relatively up to date with my Open MPI version; I'll be extra motivated to upgrade to 2.1.2 so that I don't have to remember to set these --mca flags on the command line. :o)
-Adam

On Sun, Jul 9, 2017 at 9:26 AM, Gilles Gouaillardet <gilles.gouaillar...@gmail.com <mailto:gilles.gouaillar...@gmail.com>> wrote:

    Adam,

    at first, you need to change the default send and receive socket
    buffers :
    mpirun --mca btl_tcp_sndbuf 0 --mca btl_tcp_rcvbuf 0 ...
    /* note this will be the default from Open MPI 2.1.2 */

    hopefully, that will be enough to greatly improve the bandwidth for
    large messages.


    generally speaking, i recommend you use the latest (e.g. Open MPI
    2.1.1) available version

    how many interfaces can be used to communicate between hosts ?
    if there is more than one (for example a slow and a fast one), you'd
    rather only use the fast one.
    for example, if eth0 is the fast interface, that can be achieved with
    mpirun --mca btl_tcp_if_include eth0 ...

    also, you might be able to achieve better results by using more than
    one socket on the fast interface.
    for example, if you want to use 4 sockets per interface
    mpirun --mca btl_tcp_links 4 ...



    Cheers,

    Gilles

    On Sun, Jul 9, 2017 at 10:10 PM, Adam Sylvester <op8...@gmail.com
    <mailto:op8...@gmail.com>> wrote:
    > I am using Open MPI 2.1.0 on RHEL 7.  My application has one
    unavoidable
    > pinch point where a large amount of data needs to be transferred
    (about 8 GB
    > of data needs to be both sent to and received all other ranks),
    and I'm
    > seeing worse performance than I would expect; this step has a
    major impact
    > on my overall runtime.  In the real application, I am using
    MPI_Alltoall()
    > for this step, but for the purpose of a simple benchmark, I
    simplified to
    > simply do a single MPI_Send() / MPI_Recv() between two ranks of
    a 2 GB
    > buffer.
    >
    > I'm running this in AWS with instances that have 10 Gbps
    connectivity in the
    > same availability zone (according to tracepath, there are no
    hops between
    > them) and MTU set to 8801 bytes.  Doing a non-MPI benchmark of
    sending data
    > directly over TCP between these two instances, I reliably get
    around 4 Gbps.
    > Between these same two instances with MPI_Send() / MPI_Recv(), I
    reliably
    > get around 2.4 Gbps.  This seems like a major performance
    degradation for a
    > single MPI operation.
    >
    > I compiled Open MPI 2.1.0 with gcc 4.9.1 and default settings.  I'm
    > connecting between instances via ssh and using I assume TCP for
    the actual
    > network transfer (I'm not setting any special command-line or
    programmatic
    > settings).  The actual command I'm running is:
    > mpirun -N 1 --bind-to none --hostfile hosts.txt my_app
    >
    > Any advice on other things to test or compilation and/or runtime
    flags to
    > set would be much appreciated!
    > -Adam
    >
    > _______________________________________________
    > users mailing list
    > users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
    > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
    <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
    _______________________________________________
    users mailing list
    users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
    https://rfd.newmexicoconsortium.org/mailman/listinfo/users
    <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>




_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to