Thanks Gilles. Unfortunately, my understanding is that EFA is only
available on C5n instances, not 'regular' C5 instances (
https://aws.amazon.com/about-aws/whats-new/2018/11/introducing-elastic-fabric-adapter/).
I will be using C5n instances in the future but not at this time, so I'm
hoping to ge
Adam,
FWIW, EFA adapter is available on this AWS instance, and Open MPI can use
it via libfabric (aka OFI)
Here is a link to Brian’s video
https://insidehpc.com/2018/04/amazon-libfabric-case-study-flexible-hpc-infrastructure/
Cheers,
Gilles
On Sunday, March 24, 2019, Adam Sylvester wrote:
> D
Digging up this old thread as it appears there's still an issue with
btl_tcp_links.
I'm now using c5.18xlarge instances in AWS which have 25 Gbps connectivity;
using iperf3 with the -P option to drive multiple ports, I achieve over 24
Gbps when communicating between two instances.
When I original
Bummer - thanks for the info Brian.
As an FYI, I do have a real world use case for this faster connectivity
(i.e. beyond just a benchmark). While my application will happily gobble
up and run on however many machines it's given, there's a resource manager
that lives on top of everything that dole
Adam -
The btl_tcp_links flag does not currently work (for various reasons) in the 2.x
and 3.x series. It’s on my todo list to fix, but I’m not sure it will get done
before the 3.0.0 release. Part of the reason that it hasn’t been a priority is
that most applications (outside of benchmarks) d
I switched over to X1 instances in AWS which have 20 Gbps connectivity.
Using iperf3, I'm seeing 11.1 Gbps between them with just one port. iperf3
supports a -P option which will connect using multiple ports... Setting
this to use in the range of 5-20 ports (there's some variability from run
to r
Thanks again Gilles. Ahh, better yet - I wasn't familiar with the config
file way to set these parameters... it'll be easy to bake this into my AMI
so that I don't have to set them each time while waiting for the next Open
MPI release.
Out of mostly laziness I try to keep to the formal releases r
Adam,
Thanks for letting us know your performance issue has been resolved.
yes, https://www.open-mpi.org/faq/?category=tcp is the best place to
look for this kind of information.
i will add a reference to these parameters. i will also ask folks at AWS
if they have additional/other recommen
Adam,
You can also set btl_tcp_links to 2 or 3 to allow multiple connections
between peers, with a potential higher aggregate bandwidth.
George.
On Sun, Jul 9, 2017 at 10:04 AM, Adam Sylvester wrote:
> Gilles,
>
> Thanks for the fast response!
>
> The --mca btl_tcp_sndbuf 0 --mca btl_tcp_r
Gilles,
Thanks for the fast response!
The --mca btl_tcp_sndbuf 0 --mca btl_tcp_rcvbuf 0 flags you recommended
made a huge difference - this got me up to 5.7 Gb/s! I wasn't aware of
these flags... with a little Googling, is
https://www.open-mpi.org/faq/?category=tcp the best place to look for this
Adam,
at first, you need to change the default send and receive socket buffers :
mpirun --mca btl_tcp_sndbuf 0 --mca btl_tcp_rcvbuf 0 ...
/* note this will be the default from Open MPI 2.1.2 */
hopefully, that will be enough to greatly improve the bandwidth for
large messages.
generally speakin
I am using Open MPI 2.1.0 on RHEL 7. My application has one unavoidable
pinch point where a large amount of data needs to be transferred (about 8
GB of data needs to be both sent to and received all other ranks), and I'm
seeing worse performance than I would expect; this step has a major impact
on
12 matches
Mail list logo