Adam, at first, you need to change the default send and receive socket buffers : mpirun --mca btl_tcp_sndbuf 0 --mca btl_tcp_rcvbuf 0 ... /* note this will be the default from Open MPI 2.1.2 */
hopefully, that will be enough to greatly improve the bandwidth for large messages. generally speaking, i recommend you use the latest (e.g. Open MPI 2.1.1) available version how many interfaces can be used to communicate between hosts ? if there is more than one (for example a slow and a fast one), you'd rather only use the fast one. for example, if eth0 is the fast interface, that can be achieved with mpirun --mca btl_tcp_if_include eth0 ... also, you might be able to achieve better results by using more than one socket on the fast interface. for example, if you want to use 4 sockets per interface mpirun --mca btl_tcp_links 4 ... Cheers, Gilles On Sun, Jul 9, 2017 at 10:10 PM, Adam Sylvester <op8...@gmail.com> wrote: > I am using Open MPI 2.1.0 on RHEL 7. My application has one unavoidable > pinch point where a large amount of data needs to be transferred (about 8 GB > of data needs to be both sent to and received all other ranks), and I'm > seeing worse performance than I would expect; this step has a major impact > on my overall runtime. In the real application, I am using MPI_Alltoall() > for this step, but for the purpose of a simple benchmark, I simplified to > simply do a single MPI_Send() / MPI_Recv() between two ranks of a 2 GB > buffer. > > I'm running this in AWS with instances that have 10 Gbps connectivity in the > same availability zone (according to tracepath, there are no hops between > them) and MTU set to 8801 bytes. Doing a non-MPI benchmark of sending data > directly over TCP between these two instances, I reliably get around 4 Gbps. > Between these same two instances with MPI_Send() / MPI_Recv(), I reliably > get around 2.4 Gbps. This seems like a major performance degradation for a > single MPI operation. > > I compiled Open MPI 2.1.0 with gcc 4.9.1 and default settings. I'm > connecting between instances via ssh and using I assume TCP for the actual > network transfer (I'm not setting any special command-line or programmatic > settings). The actual command I'm running is: > mpirun -N 1 --bind-to none --hostfile hosts.txt my_app > > Any advice on other things to test or compilation and/or runtime flags to > set would be much appreciated! > -Adam > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users