Gary, The current fine tuning of our TCP layer was done on a 1Gb network, and might result in the performance degradation you see. There is a relationship between the depth of the pipeline and the length of the packets, together with another set of MCA parameters that can have a drastic impact on performance.
You should start with “ompi_info —param btl tcp -l 9”. From your performance graphs I can see that Intel MPI has an eager size of around 128k (while ours is at 32k). Try to address this by setting btl_tcp_eager_limit to 128k and also btl_tcp_rndv_eager_limit to the same value. By default Open MPI assumes TCP kernel buffers of 128k. These values can be tuned at the kernel level (http://www.cyberciti.biz/faq/linux-tcp-tuning/ <http://www.cyberciti.biz/faq/linux-tcp-tuning/>) and/or you can let Open MPI know that it can use more (by setting the MCA parameters btl_tcp_sndbuf and btl_tcp_rcvbuf). Then you can play with the size of the TCP endpoint caching (it should be set to a value where the memcpy is about the same cost as a syscall). btl_tcp_endpoint_cache is the MCA parameter you are looking for. Another trick, in case the injection rate of a single fd is too slow you can ask Open MPI to use multiple channels by setting btl_tcp_links to something else than 1. On a PS4 I had to bump it up to 3-4 to get the best performance. Other parameters to be tuned: - btl_tcp_max_send_size - btl_tcp_rdma_pipeline_send_length I don’t have access to a 10Gb network to tune. If you manage to tune it, I would like to get the values for the different MCA parameters so that out TCP BTL behaves optimally by default. Thanks, George. > On Mar 10, 2016, at 11:45 , Jackson, Gary L. <gary.jack...@jhuapl.edu> wrote: > > I re-ran all experiments with 1.10.2 configured the way you specified. My > results are here: > > https://www.dropbox.com/s/4v4jaxe8sflgymj/collected.pdf?dl=0 > <https://www.dropbox.com/s/4v4jaxe8sflgymj/collected.pdf?dl=0> > > Some remarks: > OpenMPI had poor performance relative to raw TCP and IMPI across all MTUs. > Those issues appeared at larger message sizes. > Intel MPI and raw TCP were comparable across message sizes and MTUs. > With respect to some other concerns: > I verified that the MTU values I'm using are correct with tracepath. > I am using a placement group. > -- > Gary Jackson > > From: users <users-boun...@open-mpi.org <mailto:users-boun...@open-mpi.org>> > on behalf of Gilles Gouaillardet <gil...@rist.or.jp > <mailto:gil...@rist.or.jp>> > Reply-To: Open MPI Users <us...@open-mpi.org <mailto:us...@open-mpi.org>> > Date: Tuesday, March 8, 2016 at 11:07 PM > To: Open MPI Users <us...@open-mpi.org <mailto:us...@open-mpi.org>> > Subject: Re: [OMPI users] Poor performance on Amazon EC2 with TCP > > Jackson, > > one more thing, how did you build openmpi ? > > if you built from git (and without VPATH), then --enable-debug is > automatically set, and this is hurting performance. > if not already done, i recommend you download the latest openmpi tarball > (1.10.2) and > ./configure --with-platform=contrib/platform/optimized --prefix=... > last but not least, you can > mpirun --mca mpi_leave_pinned 1 <your benchmark> > (that being said, i am not sure this is useful with TCP networks ...) > > Cheers, > > Gilles > > > > On 3/9/2016 11:34 AM, Rayson Ho wrote: >> If you are using instance types that support SR-IOV (aka. "enhanced >> networking" in AWS), then turn it on. We saw huge differences when SR-IOV is >> enabled >> >> http://blogs.scalablelogic.com/2013/12/enhanced-networking-in-aws-cloud.html >> <http://blogs.scalablelogic.com/2013/12/enhanced-networking-in-aws-cloud.html> >> http://blogs.scalablelogic.com/2014/01/enhanced-networking-in-aws-cloud-part-2.html >> >> <http://blogs.scalablelogic.com/2014/01/enhanced-networking-in-aws-cloud-part-2.html> >> >> Make sure you start your instances with a placement group -- otherwise, the >> instances can be data centers apart! >> >> And check that jumbo frames are enabled properly: >> >> http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/network_mtu.html >> <http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/network_mtu.html> >> >> But still, it is interesting that Intel MPI is getting a 2X speedup with the >> same setup! Can you post the raw numbers so that we can take a deeper look?? >> >> Rayson >> >> ================================================== >> Open Grid Scheduler - The Official Open Source Grid Engine >> http://gridscheduler.sourceforge.net/ <http://gridscheduler.sourceforge.net/> >> http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html >> <http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html> >> >> >> >> >> On Tue, Mar 8, 2016 at 9:08 AM, Jackson, Gary L. < >> <mailto:gary.jack...@jhuapl.edu>gary.jack...@jhuapl.edu >> <mailto:gary.jack...@jhuapl.edu>> wrote: >> >> I've built OpenMPI 1.10.1 on Amazon EC2. Using NetPIPE, I'm seeing about >> half the performance for MPI over TCP as I do with raw TCP. Before I start >> digging in to this more deeply, does anyone know what might cause that? >> >> For what it's worth, I see the same issues with MPICH, but I do not see it >> with Intel MPI. >> >> -- >> Gary Jackson >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/03/28659.php >> <http://www.open-mpi.org/community/lists/users/2016/03/28659.php> >> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/03/28665.php >> <http://www.open-mpi.org/community/lists/users/2016/03/28665.php> > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/03/28672.php