Adam,

You can also set btl_tcp_links to 2 or 3 to allow multiple connections
between peers, with a potential higher aggregate bandwidth.

  George.



On Sun, Jul 9, 2017 at 10:04 AM, Adam Sylvester <op8...@gmail.com> wrote:

> Gilles,
>
> Thanks for the fast response!
>
> The --mca btl_tcp_sndbuf 0 --mca btl_tcp_rcvbuf 0 flags you recommended
> made a huge difference - this got me up to 5.7 Gb/s! I wasn't aware of
> these flags... with a little Googling, is https://www.open-mpi.org/faq/?
> category=tcp the best place to look for this kind of information and any
> other tweaks I may want to try (or if there's a better FAQ out there,
> please let me know)?
>
> There is only eth0 on my machines so nothing to tweak there (though good
> to know for the future). I also didn't see any improvement by specifying
> more sockets per instance. But, your initial suggestion had a major impact.
>
> In general I try to stay relatively up to date with my Open MPI version;
> I'll be extra motivated to upgrade to 2.1.2 so that I don't have to
> remember to set these --mca flags on the command line. :o)
>
> -Adam
>
> On Sun, Jul 9, 2017 at 9:26 AM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com> wrote:
>
>> Adam,
>>
>> at first, you need to change the default send and receive socket buffers :
>> mpirun --mca btl_tcp_sndbuf 0 --mca btl_tcp_rcvbuf 0 ...
>> /* note this will be the default from Open MPI 2.1.2 */
>>
>> hopefully, that will be enough to greatly improve the bandwidth for
>> large messages.
>>
>>
>> generally speaking, i recommend you use the latest (e.g. Open MPI
>> 2.1.1) available version
>>
>> how many interfaces can be used to communicate between hosts ?
>> if there is more than one (for example a slow and a fast one), you'd
>> rather only use the fast one.
>> for example, if eth0 is the fast interface, that can be achieved with
>> mpirun --mca btl_tcp_if_include eth0 ...
>>
>> also, you might be able to achieve better results by using more than
>> one socket on the fast interface.
>> for example, if you want to use 4 sockets per interface
>> mpirun --mca btl_tcp_links 4 ...
>>
>>
>>
>> Cheers,
>>
>> Gilles
>>
>> On Sun, Jul 9, 2017 at 10:10 PM, Adam Sylvester <op8...@gmail.com> wrote:
>> > I am using Open MPI 2.1.0 on RHEL 7.  My application has one unavoidable
>> > pinch point where a large amount of data needs to be transferred (about
>> 8 GB
>> > of data needs to be both sent to and received all other ranks), and I'm
>> > seeing worse performance than I would expect; this step has a major
>> impact
>> > on my overall runtime.  In the real application, I am using
>> MPI_Alltoall()
>> > for this step, but for the purpose of a simple benchmark, I simplified
>> to
>> > simply do a single MPI_Send() / MPI_Recv() between two ranks of a 2 GB
>> > buffer.
>> >
>> > I'm running this in AWS with instances that have 10 Gbps connectivity
>> in the
>> > same availability zone (according to tracepath, there are no hops
>> between
>> > them) and MTU set to 8801 bytes.  Doing a non-MPI benchmark of sending
>> data
>> > directly over TCP between these two instances, I reliably get around 4
>> Gbps.
>> > Between these same two instances with MPI_Send() / MPI_Recv(), I
>> reliably
>> > get around 2.4 Gbps.  This seems like a major performance degradation
>> for a
>> > single MPI operation.
>> >
>> > I compiled Open MPI 2.1.0 with gcc 4.9.1 and default settings.  I'm
>> > connecting between instances via ssh and using I assume TCP for the
>> actual
>> > network transfer (I'm not setting any special command-line or
>> programmatic
>> > settings).  The actual command I'm running is:
>> > mpirun -N 1 --bind-to none --hostfile hosts.txt my_app
>> >
>> > Any advice on other things to test or compilation and/or runtime flags
>> to
>> > set would be much appreciated!
>> > -Adam
>> >
>> > _______________________________________________
>> > users mailing list
>> > users@lists.open-mpi.org
>> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to