Peter,

Thanks for your input!
I tried some things:

*1) The app was placed/pinned differently by the two MPIs. Often this would
probably not cause such a big difference.*
I agree this is unlikely the cause, however I tried various configurations
of map-by, bind-to, etc and none of them had any measurable impact at all,
which points to this not being the cause (as you suspected)


*2) Bad luck wrt collective performance. Different MPIs have different weak
spots across the parameter space of numranks,transfersize,mpi-coll**ective.*
This is possible... But the magnitude of the runtime difference seems too
large to me... Are there any options we can give to OMPI to cause it to use
different collective algorithms so that we can test this theory?




*3) You're not on Mellanox infiniband but Qlogic/Intel (Truescale)
infiniband. Using openib there is better than tcp but not ideal (it uses
psm for native transport).*
I double checked - the cluster is using Mellanox Infiniband


*4) You changed more than the MPI. For example Intel compilers + intel-mpi
vs OpenMPI + gcc.*
This is correct - however I also ran 2 other tests:
IntelMPI + Clangcc:
0.3 seconds
IntelMPI + Intelcc:
0.4 seconds
MPICH MPI + Clangcc:
1 second
OpenMPI + Clangcc:
2.6 seconds

So it looks like the compiler is not the issue.

Any other ideas?
Thanks,
Cooper
Cooper Burns
Senior Research Engineer
<https://www.linkedin.com/company/convergent-science-inc>
<https://www.facebook.com/ConvergentScience>
<https://twitter.com/convergecfd>
<https://www.youtube.com/user/convergecfd>  <https://vimeo.com/convergecfd>
(608) 230-1551
convergecfd.com
<https://convergecfd.com/?utm_source=Email&utm_medium=signature&utm_campaign=CSIEmailSignature>


On Wed, Aug 28, 2019 at 2:11 AM Peter Kjellström <c...@nsc.liu.se> wrote:

> On Tue, 27 Aug 2019 14:36:54 -0500
> Cooper Burns via users <users@lists.open-mpi.org> wrote:
>
> > Hello all,
> >
> > I have been doing some MPI benchmarking on an Infiniband cluster.
> >
> > Specs are:
> > 12 cores/node
> > 2.9ghz/core
> > Infiniband interconnect (TCP also available)
> >
> > Some runtime numbers:
> > 192 cores total: (16 nodes)
> > IntelMPI:
> > 0.4 seconds
> > OpenMPI 3.1.3 (--mca btl ^tcp):
> > 2.5 seconds
> > OpenMPI 3.1.3 (--mca btl ^openib):
> > 26 seconds
>
> 5x is quite a difference...
>
> Here are a few possible reasons I can think of:
>
> 1) The app was placed/pinned differently by the two MPIs. Often this
> would probably not cause such a big difference.
>
> 2) Bad luck wrt collective performance. Different MPIs have different
> weak spots across the parameter space of
> numranks,transfersize,mpi-collective.
>
> 3) You're not on Mellanox infiniband but Qlogic/Intel (Truescale)
> infiniband. Using openib there is better than tcp but not ideal (it
> uses psm for native transport).
>
> 4) You changed more than the MPI. For example Intel compilers +
> intel-mpi vs OpenMPI + gcc.
>
> /Peter K
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to