[OMPI users] terrible infiniband performance for

Gilles Gouaillardet Wed, 23 Mar 2016 10:38:28 -0400 (EDT)

Ronald,

first, can you make sure tm was built ?
the easiest way us to
configure --with-tm ...
it will crash if tm is not found
if pbs/torque is not installed in a standard location, then you have to
configure --with-tm=<dir>


then you can omit -hostfile from your mpirun command line

hpl is known to scale, assuming the data is big enough, you use an
optimized blas, and the right number of openmp threads
(e.g. if you run 8 tasks per node, the you can have up to 2 openmp threads,
but if you use 8 or 16 threads, then performance will be worst)
first run xhpl one node, and when you get 80% of the peak performance, then
you can run on two nodes.

Cheers,

Gilles

On Wednesday, March 23, 2016, Ronald Cohen <recoh...@gmail.com
<javascript:_e(%7B%7D,'cvml','recoh...@gmail.com');>> wrote:

> The configure line was simply:
>
>  ./configure --prefix=/home/rcohen
>
> when I run:
>
> mpirun --mca btl self,vader,openib ...
>
> I get the same lousy results: 1.5 GFLOPS
>
> The output of the grep is:
>
> Cpus_allowed_list:      0-7
> Cpus_allowed_list:      8-15
> Cpus_allowed_list:      0-7
> Cpus_allowed_list:      8-15
> Cpus_allowed_list:      0-7
> Cpus_allowed_list:      8-15
> Cpus_allowed_list:      0-7
> Cpus_allowed_list:      8-15
> Cpus_allowed_list:      0-7
> Cpus_allowed_list:      8-15
> Cpus_allowed_list:      0-7
> Cpus_allowed_list:      8-15
> Cpus_allowed_list:      0-7
> Cpus_allowed_list:      8-15
> Cpus_allowed_list:      0-7
> Cpus_allowed_list:      8-15
>
>
> linpack *HPL) certainly is known to scale fine.
>
> I am running a standard benchmark--HPL--linpack.
>
> I think it is not the compiler, but I could try that.
>
> Ron
>
>
>
>
> ---
> Ron Cohen
> recoh...@gmail.com
> skypename: ronaldcohen
> twitter: @recohen3
>
>
> On Wed, Mar 23, 2016 at 9:32 AM, Gilles Gouaillardet
> <gilles.gouaillar...@gmail.com> wrote:
> > Ronald,
> >
> > the fix I mentioned landed into the v1.10 branch
> >
> https://github.com/open-mpi/ompi-release/commit/c376994b81030cfa380c29d5b8f60c3e53d3df62
> >
> > can you please post your configure command line ?
> >
> > you can also try to
> > mpirun --mca btl self,vader,openib ...
> > to make sure your run will abort instead of falling back to tcp
> >
> > then you can
> > mpirun ... grep Cpus_allowed_list /proc/self/status
> > to confirm your tasks do not end up bound to the same cores when running
> on
> > two nodes.
> >
> > is your application known to scale on infiniband network ?
> > or did you naively hope it would scale ?
> >
> > at first, I recommend you run standard benchmark to make sure you get the
> > performance you expect from your infiniband network
> > (for example IMB or OSU benchmark)
> > and run this test in the same environment than your app (e.g. via a batch
> > manager if applicable)
> >
> > if you do not get the performance you expect, then I suggest you try the
> > stock gcc compiler shipped with your distro and see if it helps.
> >
> > Cheers,
> >
> > Gilles
> >
> > On Wednesday, March 23, 2016, Ronald Cohen <recoh...@gmail.com> wrote:
> >>
> >> Thank  you! Here are the answers:
> >>
> >> I did not try a previous release of gcc.
> >> I built from a tarball.
> >> What should I do about the iirc issue--how should I check?
> >> Are there any flags I should be using for infiniband? Is this a
> >> problem with latency?
> >>
> >> Ron
> >>
> >>
> >> ---
> >> Ron Cohen
> >> recoh...@gmail.com
> >> skypename: ronaldcohen
> >> twitter: @recohen3
> >>
> >>
> >> On Wed, Mar 23, 2016 at 8:13 AM, Gilles Gouaillardet
> >> <gilles.gouaillar...@gmail.com> wrote:
> >> > Ronald,
> >> >
> >> > did you try to build openmpi with a previous gcc release ?
> >> > if yes, what about the performance ?
> >> >
> >> > did you build openmpi from a tarball or from git ?
> >> > if from git and without VPATH, then you need to
> >> > configure with --disable-debug
> >> >
> >> > iirc, one issue was identified previously
> >> > (gcc optimization that prevents the memory wrapper from behaving as
> >> > expected) and I am not sure the fix landed in v1.10 branch nor master
> >> > ...
> >> >
> >> > thanks for the info about gcc 6.0.0
> >> > now this is supported on a free compiler
> >> > (cray and intel already support that, but they are commercial
> >> > compilers),
> >> > I will resume my work on supporting this
> >> >
> >> > Cheers,
> >> >
> >> > Gilles
> >> >
> >> > On Wednesday, March 23, 2016, Ronald Cohen <recoh...@gmail.com>
> wrote:
> >> >>
> >> >> I get 100 GFLOPS for 16 cores on one node, but 1 GFLOP running 8
> cores
> >> >> on two nodes. It seems that quad-infiniband should do better than
> >> >> this. I built openmpi-1.10.2g with gcc version 6.0.0 20160317 . Any
> >> >> ideas of what to do to get usable performance? Thank you!
> >> >>
> >> >> bstatus
> >> >> Infiniband device 'mlx4_0' port 1 status:
> >> >>         default gid:     fe80:0000:0000:0000:0002:c903:00ec:9301
> >> >>         base lid:        0x1
> >> >>         sm lid:          0x1
> >> >>         state:           4: ACTIVE
> >> >>         phys state:      5: LinkUp
> >> >>         rate:            56 Gb/sec (4X FDR)
> >> >>         link_layer:      InfiniBand
> >> >>
> >> >> Ron
> >> >> --
> >> >>
> >> >> Professor Dr. Ronald Cohen
> >> >> Ludwig Maximilians Universität
> >> >> Theresienstrasse 41 Room 207
> >> >> Department für Geo- und Umweltwissenschaften
> >> >> München
> >> >> 80333
> >> >> Deutschland
> >> >>
> >> >>
> >> >> ronald.co...@min.uni-muenchen.de
> >> >> skype: ronaldcohen
> >> >> +49 (0) 89 74567980
> >> >> ---
> >> >> Ronald Cohen
> >> >> Geophysical Laboratory
> >> >> Carnegie Institution
> >> >> 5251 Broad Branch Rd., N.W.
> >> >> Washington, D.C. 20015
> >> >> rco...@carnegiescience.edu
> >> >> office: 202-478-8937
> >> >> skype: ronaldcohen
> >> >> https://twitter.com/recohen3
> >> >> https://www.linkedin.com/profile/view?id=163327727
> >> >>
> >> >>
> >> >> ---
> >> >> Ron Cohen
> >> >> recoh...@gmail.com
> >> >> skypename: ronaldcohen
> >> >> twitter: @recohen3
> >> >> _______________________________________________
> >> >> users mailing list
> >> >> us...@open-mpi.org
> >> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> >> Link to this post:
> >> >> http://www.open-mpi.org/community/lists/users/2016/03/28791.php
> >> >
> >> >
> >> > _______________________________________________
> >> > users mailing list
> >> > us...@open-mpi.org
> >> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> > Link to this post:
> >> > http://www.open-mpi.org/community/lists/users/2016/03/28793.php
> >> _______________________________________________
> >> users mailing list
> >> us...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> Link to this post:
> >> http://www.open-mpi.org/community/lists/users/2016/03/28794.php
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> > http://www.open-mpi.org/community/lists/users/2016/03/28796.php
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/03/28800.php

[OMPI users] terrible infiniband performance for

Reply via email to