Gilles, Thanks for the fast response!
The --mca btl_tcp_sndbuf 0 --mca btl_tcp_rcvbuf 0 flags you recommended made a huge difference - this got me up to 5.7 Gb/s! I wasn't aware of these flags... with a little Googling, is https://www.open-mpi.org/faq/?category=tcp the best place to look for this kind of information and any other tweaks I may want to try (or if there's a better FAQ out there, please let me know)? There is only eth0 on my machines so nothing to tweak there (though good to know for the future). I also didn't see any improvement by specifying more sockets per instance. But, your initial suggestion had a major impact. In general I try to stay relatively up to date with my Open MPI version; I'll be extra motivated to upgrade to 2.1.2 so that I don't have to remember to set these --mca flags on the command line. :o) -Adam On Sun, Jul 9, 2017 at 9:26 AM, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > Adam, > > at first, you need to change the default send and receive socket buffers : > mpirun --mca btl_tcp_sndbuf 0 --mca btl_tcp_rcvbuf 0 ... > /* note this will be the default from Open MPI 2.1.2 */ > > hopefully, that will be enough to greatly improve the bandwidth for > large messages. > > > generally speaking, i recommend you use the latest (e.g. Open MPI > 2.1.1) available version > > how many interfaces can be used to communicate between hosts ? > if there is more than one (for example a slow and a fast one), you'd > rather only use the fast one. > for example, if eth0 is the fast interface, that can be achieved with > mpirun --mca btl_tcp_if_include eth0 ... > > also, you might be able to achieve better results by using more than > one socket on the fast interface. > for example, if you want to use 4 sockets per interface > mpirun --mca btl_tcp_links 4 ... > > > > Cheers, > > Gilles > > On Sun, Jul 9, 2017 at 10:10 PM, Adam Sylvester <op8...@gmail.com> wrote: > > I am using Open MPI 2.1.0 on RHEL 7. My application has one unavoidable > > pinch point where a large amount of data needs to be transferred (about > 8 GB > > of data needs to be both sent to and received all other ranks), and I'm > > seeing worse performance than I would expect; this step has a major > impact > > on my overall runtime. In the real application, I am using > MPI_Alltoall() > > for this step, but for the purpose of a simple benchmark, I simplified to > > simply do a single MPI_Send() / MPI_Recv() between two ranks of a 2 GB > > buffer. > > > > I'm running this in AWS with instances that have 10 Gbps connectivity in > the > > same availability zone (according to tracepath, there are no hops between > > them) and MTU set to 8801 bytes. Doing a non-MPI benchmark of sending > data > > directly over TCP between these two instances, I reliably get around 4 > Gbps. > > Between these same two instances with MPI_Send() / MPI_Recv(), I reliably > > get around 2.4 Gbps. This seems like a major performance degradation > for a > > single MPI operation. > > > > I compiled Open MPI 2.1.0 with gcc 4.9.1 and default settings. I'm > > connecting between instances via ssh and using I assume TCP for the > actual > > network transfer (I'm not setting any special command-line or > programmatic > > settings). The actual command I'm running is: > > mpirun -N 1 --bind-to none --hostfile hosts.txt my_app > > > > Any advice on other things to test or compilation and/or runtime flags to > > set would be much appreciated! > > -Adam > > > > _______________________________________________ > > users mailing list > > users@lists.open-mpi.org > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users >
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users