I re-ran all experiments with 1.10.2 configured the way you specified. My 
results are here:

https://www.dropbox.com/s/4v4jaxe8sflgymj/collected.pdf?dl=0

Some remarks:

  1.  OpenMPI had poor performance relative to raw TCP and IMPI across all MTUs.
  2.  Those issues appeared at larger message sizes.
  3.  Intel MPI and raw TCP were comparable across message sizes and MTUs.

With respect to some other concerns:

  1.  I verified that the MTU values I'm using are correct with tracepath.
  2.  I am using a placement group.

--
Gary Jackson

From: users <users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>> on 
behalf of Gilles Gouaillardet <gil...@rist.or.jp<mailto:gil...@rist.or.jp>>
Reply-To: Open MPI Users <us...@open-mpi.org<mailto:us...@open-mpi.org>>
List-Post: users@lists.open-mpi.org
Date: Tuesday, March 8, 2016 at 11:07 PM
To: Open MPI Users <us...@open-mpi.org<mailto:us...@open-mpi.org>>
Subject: Re: [OMPI users] Poor performance on Amazon EC2 with TCP

Jackson,

one more thing, how did you build openmpi ?

if you built from git (and without VPATH), then --enable-debug is automatically 
set, and this is hurting performance.
if not already done, i recommend you download the latest openmpi tarball 
(1.10.2) and
./configure --with-platform=contrib/platform/optimized --prefix=...
last but not least, you can
mpirun --mca mpi_leave_pinned 1 <your benchmark>
(that being said, i am not sure this is useful with TCP networks ...)

Cheers,

Gilles



On 3/9/2016 11:34 AM, Rayson Ho wrote:
If you are using instance types that support SR-IOV (aka. "enhanced networking" 
in AWS), then turn it on. We saw huge differences when SR-IOV is enabled

http://blogs.scalablelogic.com/2013/12/enhanced-networking-in-aws-cloud.html
http://blogs.scalablelogic.com/2014/01/enhanced-networking-in-aws-cloud-part-2.html

Make sure you start your instances with a placement group -- otherwise, the 
instances can be data centers apart!

And check that jumbo frames are enabled properly:

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/network_mtu.html

But still, it is interesting that Intel MPI is getting a 2X speedup with the 
same setup! Can you post the raw numbers so that we can take a deeper look??

Rayson

==================================================
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/
http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html




On Tue, Mar 8, 2016 at 9:08 AM, Jackson, Gary L. 
<<mailto:gary.jack...@jhuapl.edu>gary.jack...@jhuapl.edu<mailto:gary.jack...@jhuapl.edu>>
 wrote:

I've built OpenMPI 1.10.1 on Amazon EC2. Using NetPIPE, I'm seeing about half 
the performance for MPI over TCP as I do with raw TCP. Before I start digging 
in to this more deeply, does anyone know what might cause that?

For what it's worth, I see the same issues with MPICH, but I do not see it with 
Intel MPI.

--
Gary Jackson


_______________________________________________
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/03/28659.php




_______________________________________________
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/03/28665.php

Reply via email to