Re: [OMPI users] Network performance over TCP

2019-03-23 Thread Adam Sylvester
Thanks Gilles. Unfortunately, my understanding is that EFA is only available on C5n instances, not 'regular' C5 instances ( https://aws.amazon.com/about-aws/whats-new/2018/11/introducing-elastic-fabric-adapter/). I will be using C5n instances in the future but not at this time, so I'm hoping to ge

Re: [OMPI users] Network performance over TCP

2019-03-23 Thread Gilles Gouaillardet
Adam, FWIW, EFA adapter is available on this AWS instance, and Open MPI can use it via libfabric (aka OFI) Here is a link to Brian’s video https://insidehpc.com/2018/04/amazon-libfabric-case-study-flexible-hpc-infrastructure/ Cheers, Gilles On Sunday, March 24, 2019, Adam Sylvester wrote: > D

Re: [OMPI users] Network performance over TCP

2019-03-23 Thread Adam Sylvester
Digging up this old thread as it appears there's still an issue with btl_tcp_links. I'm now using c5.18xlarge instances in AWS which have 25 Gbps connectivity; using iperf3 with the -P option to drive multiple ports, I achieve over 24 Gbps when communicating between two instances. When I original

Re: [OMPI users] Network performance over TCP

2017-07-13 Thread Adam Sylvester
Bummer - thanks for the info Brian. As an FYI, I do have a real world use case for this faster connectivity (i.e. beyond just a benchmark). While my application will happily gobble up and run on however many machines it's given, there's a resource manager that lives on top of everything that dole

Re: [OMPI users] Network performance over TCP

2017-07-12 Thread Barrett, Brian via users
Adam - The btl_tcp_links flag does not currently work (for various reasons) in the 2.x and 3.x series. It’s on my todo list to fix, but I’m not sure it will get done before the 3.0.0 release. Part of the reason that it hasn’t been a priority is that most applications (outside of benchmarks) d

Re: [OMPI users] Network performance over TCP

2017-07-12 Thread Adam Sylvester
I switched over to X1 instances in AWS which have 20 Gbps connectivity. Using iperf3, I'm seeing 11.1 Gbps between them with just one port. iperf3 supports a -P option which will connect using multiple ports... Setting this to use in the range of 5-20 ports (there's some variability from run to r

Re: [OMPI users] Network performance over TCP

2017-07-11 Thread Adam Sylvester
Thanks again Gilles. Ahh, better yet - I wasn't familiar with the config file way to set these parameters... it'll be easy to bake this into my AMI so that I don't have to set them each time while waiting for the next Open MPI release. Out of mostly laziness I try to keep to the formal releases r

Re: [OMPI users] Network performance over TCP

2017-07-09 Thread Gilles Gouaillardet
Adam, Thanks for letting us know your performance issue has been resolved. yes, https://www.open-mpi.org/faq/?category=tcp is the best place to look for this kind of information. i will add a reference to these parameters. i will also ask folks at AWS if they have additional/other recommen

Re: [OMPI users] Network performance over TCP

2017-07-09 Thread George Bosilca
Adam, You can also set btl_tcp_links to 2 or 3 to allow multiple connections between peers, with a potential higher aggregate bandwidth. George. On Sun, Jul 9, 2017 at 10:04 AM, Adam Sylvester wrote: > Gilles, > > Thanks for the fast response! > > The --mca btl_tcp_sndbuf 0 --mca btl_tcp_r

Re: [OMPI users] Network performance over TCP

2017-07-09 Thread Adam Sylvester
Gilles, Thanks for the fast response! The --mca btl_tcp_sndbuf 0 --mca btl_tcp_rcvbuf 0 flags you recommended made a huge difference - this got me up to 5.7 Gb/s! I wasn't aware of these flags... with a little Googling, is https://www.open-mpi.org/faq/?category=tcp the best place to look for this

Re: [OMPI users] Network performance over TCP

2017-07-09 Thread Gilles Gouaillardet
Adam, at first, you need to change the default send and receive socket buffers : mpirun --mca btl_tcp_sndbuf 0 --mca btl_tcp_rcvbuf 0 ... /* note this will be the default from Open MPI 2.1.2 */ hopefully, that will be enough to greatly improve the bandwidth for large messages. generally speakin

[OMPI users] Network performance over TCP

2017-07-09 Thread Adam Sylvester
I am using Open MPI 2.1.0 on RHEL 7. My application has one unavoidable pinch point where a large amount of data needs to be transferred (about 8 GB of data needs to be both sent to and received all other ranks), and I'm seeing worse performance than I would expect; this step has a major impact on