Gary,

The current fine tuning of our TCP layer was done on a 1Gb network, and might 
result in the performance degradation you see. There is a relationship between 
the depth of the pipeline and the length of the packets, together with another 
set of MCA parameters that can have a drastic impact on performance.

You should start with “ompi_info —param btl tcp -l 9”.

From your performance graphs I can see that Intel MPI has an eager size of 
around 128k (while ours is at 32k). Try to address this by setting 
btl_tcp_eager_limit to 128k and also btl_tcp_rndv_eager_limit to the same value.

By default Open MPI assumes TCP kernel buffers of 128k. These values can be 
tuned at the kernel level (http://www.cyberciti.biz/faq/linux-tcp-tuning/ 
<http://www.cyberciti.biz/faq/linux-tcp-tuning/>) and/or you can let Open MPI 
know that it can use more (by setting the MCA parameters btl_tcp_sndbuf and 
btl_tcp_rcvbuf).

Then you can play with the size of the TCP endpoint caching (it should be set 
to a value where the memcpy is about the same cost as a syscall). 
btl_tcp_endpoint_cache is the MCA parameter you are looking for.

Another trick, in case the injection rate of a single fd is too slow you can 
ask Open MPI to use multiple channels by setting btl_tcp_links to something 
else than 1. On a PS4 I had to bump it up to 3-4 to get the best performance.

Other parameters to be tuned:
- btl_tcp_max_send_size
- btl_tcp_rdma_pipeline_send_length

I don’t have access to a 10Gb network to tune. If you manage to tune it, I 
would like to get the values for the different MCA parameters so that out TCP 
BTL behaves optimally by default.

  Thanks,
    George.


> On Mar 10, 2016, at 11:45 , Jackson, Gary L. <gary.jack...@jhuapl.edu> wrote:
> 
> I re-ran all experiments with 1.10.2 configured the way you specified. My 
> results are here:
> 
> https://www.dropbox.com/s/4v4jaxe8sflgymj/collected.pdf?dl=0 
> <https://www.dropbox.com/s/4v4jaxe8sflgymj/collected.pdf?dl=0>
> 
> Some remarks:
> OpenMPI had poor performance relative to raw TCP and IMPI across all MTUs.
> Those issues appeared at larger message sizes.
> Intel MPI and raw TCP were comparable across message sizes and MTUs.
> With respect to some other concerns:
> I verified that the MTU values I'm using are correct with tracepath.
> I am using a placement group.
> -- 
> Gary Jackson
> 
> From: users <users-boun...@open-mpi.org <mailto:users-boun...@open-mpi.org>> 
> on behalf of Gilles Gouaillardet <gil...@rist.or.jp 
> <mailto:gil...@rist.or.jp>>
> Reply-To: Open MPI Users <us...@open-mpi.org <mailto:us...@open-mpi.org>>
> Date: Tuesday, March 8, 2016 at 11:07 PM
> To: Open MPI Users <us...@open-mpi.org <mailto:us...@open-mpi.org>>
> Subject: Re: [OMPI users] Poor performance on Amazon EC2 with TCP
> 
> Jackson,
> 
> one more thing, how did you build openmpi ?
> 
> if you built from git (and without VPATH), then --enable-debug is 
> automatically set, and this is hurting performance.
> if not already done, i recommend you download the latest openmpi tarball 
> (1.10.2) and
> ./configure --with-platform=contrib/platform/optimized --prefix=...
> last but not least, you can
> mpirun --mca mpi_leave_pinned 1 <your benchmark>
> (that being said, i am not sure this is useful with TCP networks ...)
> 
> Cheers,
> 
> Gilles
> 
> 
> 
> On 3/9/2016 11:34 AM, Rayson Ho wrote:
>> If you are using instance types that support SR-IOV (aka. "enhanced 
>> networking" in AWS), then turn it on. We saw huge differences when SR-IOV is 
>> enabled
>> 
>> http://blogs.scalablelogic.com/2013/12/enhanced-networking-in-aws-cloud.html 
>> <http://blogs.scalablelogic.com/2013/12/enhanced-networking-in-aws-cloud.html>
>> http://blogs.scalablelogic.com/2014/01/enhanced-networking-in-aws-cloud-part-2.html
>>  
>> <http://blogs.scalablelogic.com/2014/01/enhanced-networking-in-aws-cloud-part-2.html>
>> 
>> Make sure you start your instances with a placement group -- otherwise, the 
>> instances can be data centers apart!
>> 
>> And check that jumbo frames are enabled properly:
>> 
>> http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/network_mtu.html 
>> <http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/network_mtu.html>
>> 
>> But still, it is interesting that Intel MPI is getting a 2X speedup with the 
>> same setup! Can you post the raw numbers so that we can take a deeper look??
>> 
>> Rayson
>> 
>> ==================================================
>> Open Grid Scheduler - The Official Open Source Grid Engine
>> http://gridscheduler.sourceforge.net/ <http://gridscheduler.sourceforge.net/>
>> http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html 
>> <http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html>
>> 
>> 
>> 
>> 
>> On Tue, Mar 8, 2016 at 9:08 AM, Jackson, Gary L. < 
>> <mailto:gary.jack...@jhuapl.edu>gary.jack...@jhuapl.edu 
>> <mailto:gary.jack...@jhuapl.edu>> wrote:
>> 
>> I've built OpenMPI 1.10.1 on Amazon EC2. Using NetPIPE, I'm seeing about 
>> half the performance for MPI over TCP as I do with raw TCP. Before I start 
>> digging in to this more deeply, does anyone know what might cause that?
>> 
>> For what it's worth, I see the same issues with MPICH, but I do not see it 
>> with Intel MPI.
>> 
>> -- 
>> Gary Jackson
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2016/03/28659.php 
>> <http://www.open-mpi.org/community/lists/users/2016/03/28659.php>
>> 
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2016/03/28665.php 
>> <http://www.open-mpi.org/community/lists/users/2016/03/28665.php>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/03/28672.php

Reply via email to