I re-ran all experiments with 1.10.2 configured the way you specified. My results are here:
https://www.dropbox.com/s/4v4jaxe8sflgymj/collected.pdf?dl=0 Some remarks: 1. OpenMPI had poor performance relative to raw TCP and IMPI across all MTUs. 2. Those issues appeared at larger message sizes. 3. Intel MPI and raw TCP were comparable across message sizes and MTUs. With respect to some other concerns: 1. I verified that the MTU values I'm using are correct with tracepath. 2. I am using a placement group. -- Gary Jackson From: users <users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>> on behalf of Gilles Gouaillardet <gil...@rist.or.jp<mailto:gil...@rist.or.jp>> Reply-To: Open MPI Users <us...@open-mpi.org<mailto:us...@open-mpi.org>> List-Post: users@lists.open-mpi.org Date: Tuesday, March 8, 2016 at 11:07 PM To: Open MPI Users <us...@open-mpi.org<mailto:us...@open-mpi.org>> Subject: Re: [OMPI users] Poor performance on Amazon EC2 with TCP Jackson, one more thing, how did you build openmpi ? if you built from git (and without VPATH), then --enable-debug is automatically set, and this is hurting performance. if not already done, i recommend you download the latest openmpi tarball (1.10.2) and ./configure --with-platform=contrib/platform/optimized --prefix=... last but not least, you can mpirun --mca mpi_leave_pinned 1 <your benchmark> (that being said, i am not sure this is useful with TCP networks ...) Cheers, Gilles On 3/9/2016 11:34 AM, Rayson Ho wrote: If you are using instance types that support SR-IOV (aka. "enhanced networking" in AWS), then turn it on. We saw huge differences when SR-IOV is enabled http://blogs.scalablelogic.com/2013/12/enhanced-networking-in-aws-cloud.html http://blogs.scalablelogic.com/2014/01/enhanced-networking-in-aws-cloud-part-2.html Make sure you start your instances with a placement group -- otherwise, the instances can be data centers apart! And check that jumbo frames are enabled properly: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/network_mtu.html But still, it is interesting that Intel MPI is getting a 2X speedup with the same setup! Can you post the raw numbers so that we can take a deeper look?? Rayson ================================================== Open Grid Scheduler - The Official Open Source Grid Engine http://gridscheduler.sourceforge.net/ http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html On Tue, Mar 8, 2016 at 9:08 AM, Jackson, Gary L. <<mailto:gary.jack...@jhuapl.edu>gary.jack...@jhuapl.edu<mailto:gary.jack...@jhuapl.edu>> wrote: I've built OpenMPI 1.10.1 on Amazon EC2. Using NetPIPE, I'm seeing about half the performance for MPI over TCP as I do with raw TCP. Before I start digging in to this more deeply, does anyone know what might cause that? For what it's worth, I see the same issues with MPICH, but I do not see it with Intel MPI. -- Gary Jackson _______________________________________________ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2016/03/28659.php _______________________________________________ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2016/03/28665.php