By doing a parameter sweep, the best results I've gotten are with:

btl_tcp_eager_limit & btl_tcp_rndv_eager_limit = 2 ** 17
btl_tcp_sndbuf & btl_tcp_rcvbuf = 2 ** 24
btl_tcp_endpoint_cache = 2 ** 12
btl_tcp_tcp_links = 2

Even so, the peak performance is around 7000Mbits/s and a message size
around a megabyte.

For what it's worth, you can get access to AWS resources to do your own
tuning, which may be more expedient than working through me as a proxy.
Right now I'm using two c4.8xlarge instances in a placement group at
$1.675/hour each to work this out. I only keep them around for as long as
I'm using them, then I terminate the instances when I'm done working.

-- 
Gary Jackson



From:  users <users-boun...@open-mpi.org> on behalf of George Bosilca
<bosi...@icl.utk.edu>
Reply-To:  Open MPI Users <us...@open-mpi.org>
List-Post: users@lists.open-mpi.org
Date:  Friday, March 11, 2016 at 11:19 AM
To:  Open MPI Users <us...@open-mpi.org>
Subject:  Re: [OMPI users] Poor performance on Amazon EC2 with TCP


Gary,

The current fine tuning of our TCP layer was done on a 1Gb network, and
might result in the performance degradation you see. There is a
relationship between the depth of the pipeline and the length of the
packets, together with another set of
 MCA parameters that can have a drastic impact on performance.

You should start with ³ompi_info ‹param btl tcp -l 9².

>From your performance graphs I can see that Intel MPI has an eager size of
around 128k (while ours is at 32k). Try to address this by setting
btl_tcp_eager_limit to 128k and also btl_tcp_rndv_eager_limit to the same
value.

By default Open MPI assumes TCP kernel buffers of 128k. These values can
be tuned at the kernel level
(http://www.cyberciti.biz/faq/linux-tcp-tuning/) and/or you can let Open
 MPI know that it can use more (by setting the MCA parameters
btl_tcp_sndbuf and btl_tcp_rcvbuf).

Then you can play with the size of the TCP endpoint caching (it should be
set to a value where the memcpy is about the same cost as a syscall).
btl_tcp_endpoint_cache is the MCA parameter you are looking for.

Another trick, in case the injection rate of a single fd is too slow you
can ask Open MPI to use multiple channels by setting btl_tcp_links to
something else than 1. On a PS4 I had to bump it up to 3-4 to get the best
performance.

Other parameters to be tuned:
- btl_tcp_max_send_size
- btl_tcp_rdma_pipeline_send_length

I don¹t have access to a 10Gb network to tune. If you manage to tune it, I
would like to get the values for the different MCA parameters so that out
TCP BTL behaves optimally by default.

  Thanks,
    George.



On Mar 10, 2016, at 11:45 , Jackson, Gary L. <gary.jack...@jhuapl.edu>
wrote:

I re-ran all experiments with 1.10.2 configured the way you specified. My
results are here:

https://www.dropbox.com/s/4v4jaxe8sflgymj/collected.pdf?dl=0

Some remarks:

1. OpenMPI had poor performance relative to raw TCP and IMPI across all
MTUs.
2. Those issues appeared at larger message sizes.
3. Intel MPI and raw TCP were comparable across message sizes and MTUs.

With respect to some other concerns:

1. I verified that the MTU values I'm using are correct with tracepath.
2. I am using a placement group.

-- 
Gary Jackson



From: users <users-boun...@open-mpi.org> on behalf of Gilles Gouaillardet
<gil...@rist.or.jp>
Reply-To: Open MPI Users <us...@open-mpi.org>
List-Post: users@lists.open-mpi.org
Date: Tuesday, March 8, 2016 at 11:07 PM
To: Open MPI Users <us...@open-mpi.org>
Subject: Re: [OMPI users] Poor performance on Amazon EC2 with TCP


Jackson,

one more thing, how did you build openmpi ?

if you built from git (and without VPATH), then --enable-debug is
automatically set, and this is hurting performance.
if not already done, i recommend you download the latest openmpi tarball
(1.10.2) and
./configure --with-platform=contrib/platform/optimized --prefix=...
last but not least, you can
mpirun --mca mpi_leave_pinned 1 <your benchmark>
(that being said, i am not sure this is useful with TCP networks ...)

Cheers,

Gilles



On 3/9/2016 11:34 AM, Rayson Ho wrote:


If you are using instance types that support SR-IOV (aka. "enhanced
networking" in AWS), then turn it on. We saw huge differences when SR-IOV
is enabled

http://blogs.scalablelogic.com/2013/12/enhanced-networking-in-aws-cloud.htm
l

http://blogs.scalablelogic.com/2014/01/enhanced-networking-in-aws-cloud-par
t-2.html


Make sure you start your instances with a placement group -- otherwise,
the instances can be data centers apart!


And check that jumbo frames are enabled properly:

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/network_mtu.html

But still, it is interesting that Intel MPI is getting a 2X speedup with
the same setup! Can you post the raw numbers so that we can take a deeper
look??

Rayson

==================================================
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/
http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html





On Tue, Mar 8, 2016 at 9:08 AM, Jackson, Gary L.
< <mailto:gary.jack...@jhuapl.edu>gary.jack...@jhuapl.edu> wrote:


I've built OpenMPI 1.10.1 on Amazon EC2. Using NetPIPE, I'm seeing about
half the performance for MPI over TCP as I do with raw TCP. Before I start
digging in to this more deeply, does anyone know what might cause that?

For what it's worth, I see the same issues with MPICH, but I do not see it
with Intel MPI.

-- 
Gary Jackson




_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: 
http://www.open-mpi.org/mailman/listinfo.cgi/users
<http://www.open-mpi.org/mailman/listinfo.cgi/users>
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/03/28659.php
<http://www.open-mpi.org/community/lists/users/2016/03/28659.php>








 
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/03/28665.php





_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/03/28672.php





Reply via email to