I don't have any parameters set other than the defaults--thank you! Ron
--- Ron Cohen recoh...@gmail.com skypename: ronaldcohen twitter: @recohen3 On Wed, Mar 23, 2016 at 11:07 AM, Edgar Gabriel <egabr...@central.uh.edu> wrote: > not sure whether it is relevant in this case, but I spent in January nearly > one week to figure out why the openib component was running very slow with > the new Open MPI releases (though it was the 2.x series at that time), and > the culprit turned out to be the > btl_openib_flags parameter. I used to set this parameter in former releases > to get good performance on my cluster, but it lead to absolutely disastrous > performance with the new version. So if you have any parameters set, try to > remove them completely and see whether this makes a difference. > > Edgar > > > > On 3/23/2016 10:01 AM, Gilles Gouaillardet wrote: > > Ronald, > > out of curiosity, what kind of performance do you get with tcp and two nodes > ? > e.g. > mpirun --mca tcp,vader,self ... > > before that, you can > mpirun uptime > to ensure all your nodes are free > (e.g. no process was left running by an other job) > > you might also want to allocate your nodes exclusively (iirc, qsub -x) to > avoid side effects > > Cheers, > > Gilles > > On Wednesday, March 23, 2016, Gilles Gouaillardet > <gilles.gouaillar...@gmail.com> wrote: >> >> Ronald, >> >> first, can you make sure tm was built ? >> the easiest way us to >> configure --with-tm ... >> it will crash if tm is not found >> if pbs/torque is not installed in a standard location, then you have to >> configure --with-tm=<dir> >> >> then you can omit -hostfile from your mpirun command line >> >> hpl is known to scale, assuming the data is big enough, you use an >> optimized blas, and the right number of openmp threads >> (e.g. if you run 8 tasks per node, the you can have up to 2 openmp >> threads, but if you use 8 or 16 threads, then performance will be worst) >> first run xhpl one node, and when you get 80% of the peak performance, >> then you can run on two nodes. >> >> Cheers, >> >> Gilles >> >> On Wednesday, March 23, 2016, Ronald Cohen <recoh...@gmail.com> wrote: >>> >>> The configure line was simply: >>> >>> ./configure --prefix=/home/rcohen >>> >>> when I run: >>> >>> mpirun --mca btl self,vader,openib ... >>> >>> I get the same lousy results: 1.5 GFLOPS >>> >>> The output of the grep is: >>> >>> Cpus_allowed_list: 0-7 >>> Cpus_allowed_list: 8-15 >>> Cpus_allowed_list: 0-7 >>> Cpus_allowed_list: 8-15 >>> Cpus_allowed_list: 0-7 >>> Cpus_allowed_list: 8-15 >>> Cpus_allowed_list: 0-7 >>> Cpus_allowed_list: 8-15 >>> Cpus_allowed_list: 0-7 >>> Cpus_allowed_list: 8-15 >>> Cpus_allowed_list: 0-7 >>> Cpus_allowed_list: 8-15 >>> Cpus_allowed_list: 0-7 >>> Cpus_allowed_list: 8-15 >>> Cpus_allowed_list: 0-7 >>> Cpus_allowed_list: 8-15 >>> >>> >>> linpack *HPL) certainly is known to scale fine. >>> >>> I am running a standard benchmark--HPL--linpack. >>> >>> I think it is not the compiler, but I could try that. >>> >>> Ron >>> >>> >>> >>> >>> --- >>> Ron Cohen >>> recoh...@gmail.com >>> skypename: ronaldcohen >>> twitter: @recohen3 >>> >>> >>> On Wed, Mar 23, 2016 at 9:32 AM, Gilles Gouaillardet >>> <gilles.gouaillar...@gmail.com> wrote: >>> > Ronald, >>> > >>> > the fix I mentioned landed into the v1.10 branch >>> > >>> > https://github.com/open-mpi/ompi-release/commit/c376994b81030cfa380c29d5b8f60c3e53d3df62 >>> > >>> > can you please post your configure command line ? >>> > >>> > you can also try to >>> > mpirun --mca btl self,vader,openib ... >>> > to make sure your run will abort instead of falling back to tcp >>> > >>> > then you can >>> > mpirun ... grep Cpus_allowed_list /proc/self/status >>> > to confirm your tasks do not end up bound to the same cores when >>> > running on >>> > two nodes. >>> > >>> > is your application known to scale on infiniband network ? >>> > or did you naively hope it would scale ? >>> > >>> > at first, I recommend you run standard benchmark to make sure you get >>> > the >>> > performance you expect from your infiniband network >>> > (for example IMB or OSU benchmark) >>> > and run this test in the same environment than your app (e.g. via a >>> > batch >>> > manager if applicable) >>> > >>> > if you do not get the performance you expect, then I suggest you try >>> > the >>> > stock gcc compiler shipped with your distro and see if it helps. >>> > >>> > Cheers, >>> > >>> > Gilles >>> > >>> > On Wednesday, March 23, 2016, Ronald Cohen <recoh...@gmail.com> wrote: >>> >> >>> >> Thank you! Here are the answers: >>> >> >>> >> I did not try a previous release of gcc. >>> >> I built from a tarball. >>> >> What should I do about the iirc issue--how should I check? >>> >> Are there any flags I should be using for infiniband? Is this a >>> >> problem with latency? >>> >> >>> >> Ron >>> >> >>> >> >>> >> --- >>> >> Ron Cohen >>> >> recoh...@gmail.com >>> >> skypename: ronaldcohen >>> >> twitter: @recohen3 >>> >> >>> >> >>> >> On Wed, Mar 23, 2016 at 8:13 AM, Gilles Gouaillardet >>> >> <gilles.gouaillar...@gmail.com> wrote: >>> >> > Ronald, >>> >> > >>> >> > did you try to build openmpi with a previous gcc release ? >>> >> > if yes, what about the performance ? >>> >> > >>> >> > did you build openmpi from a tarball or from git ? >>> >> > if from git and without VPATH, then you need to >>> >> > configure with --disable-debug >>> >> > >>> >> > iirc, one issue was identified previously >>> >> > (gcc optimization that prevents the memory wrapper from behaving as >>> >> > expected) and I am not sure the fix landed in v1.10 branch nor >>> >> > master >>> >> > ... >>> >> > >>> >> > thanks for the info about gcc 6.0.0 >>> >> > now this is supported on a free compiler >>> >> > (cray and intel already support that, but they are commercial >>> >> > compilers), >>> >> > I will resume my work on supporting this >>> >> > >>> >> > Cheers, >>> >> > >>> >> > Gilles >>> >> > >>> >> > On Wednesday, March 23, 2016, Ronald Cohen <recoh...@gmail.com> >>> >> > wrote: >>> >> >> >>> >> >> I get 100 GFLOPS for 16 cores on one node, but 1 GFLOP running 8 >>> >> >> cores >>> >> >> on two nodes. It seems that quad-infiniband should do better than >>> >> >> this. I built openmpi-1.10.2g with gcc version 6.0.0 20160317 . Any >>> >> >> ideas of what to do to get usable performance? Thank you! >>> >> >> >>> >> >> bstatus >>> >> >> Infiniband device 'mlx4_0' port 1 status: >>> >> >> default gid: fe80:0000:0000:0000:0002:c903:00ec:9301 >>> >> >> base lid: 0x1 >>> >> >> sm lid: 0x1 >>> >> >> state: 4: ACTIVE >>> >> >> phys state: 5: LinkUp >>> >> >> rate: 56 Gb/sec (4X FDR) >>> >> >> link_layer: InfiniBand >>> >> >> >>> >> >> Ron >>> >> >> -- >>> >> >> >>> >> >> Professor Dr. Ronald Cohen >>> >> >> Ludwig Maximilians Universität >>> >> >> Theresienstrasse 41 Room 207 >>> >> >> Department für Geo- und Umweltwissenschaften >>> >> >> München >>> >> >> 80333 >>> >> >> Deutschland >>> >> >> >>> >> >> >>> >> >> ronald.co...@min.uni-muenchen.de >>> >> >> skype: ronaldcohen >>> >> >> +49 (0) 89 74567980 >>> >> >> --- >>> >> >> Ronald Cohen >>> >> >> Geophysical Laboratory >>> >> >> Carnegie Institution >>> >> >> 5251 Broad Branch Rd., N.W. >>> >> >> Washington, D.C. 20015 >>> >> >> rco...@carnegiescience.edu >>> >> >> office: 202-478-8937 >>> >> >> skype: ronaldcohen >>> >> >> https://twitter.com/recohen3 >>> >> >> https://www.linkedin.com/profile/view?id=163327727 >>> >> >> >>> >> >> >>> >> >> --- >>> >> >> Ron Cohen >>> >> >> recoh...@gmail.com >>> >> >> skypename: ronaldcohen >>> >> >> twitter: @recohen3 >>> >> >> _______________________________________________ >>> >> >> users mailing list >>> >> >> us...@open-mpi.org >>> >> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> Link to this post: >>> >> >> http://www.open-mpi.org/community/lists/users/2016/03/28791.php >>> >> > >>> >> > >>> >> > _______________________________________________ >>> >> > users mailing list >>> >> > us...@open-mpi.org >>> >> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> > Link to this post: >>> >> > http://www.open-mpi.org/community/lists/users/2016/03/28793.php >>> >> _______________________________________________ >>> >> users mailing list >>> >> us...@open-mpi.org >>> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> Link to this post: >>> >> http://www.open-mpi.org/community/lists/users/2016/03/28794.php >>> > >>> > >>> > _______________________________________________ >>> > users mailing list >>> > us...@open-mpi.org >>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> > Link to this post: >>> > http://www.open-mpi.org/community/lists/users/2016/03/28796.php >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/03/28800.php > > > -- > Edgar Gabriel > Associate Professor > Parallel Software Technologies Lab http://pstl.cs.uh.edu > Department of Computer Science University of Houston > Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA > Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335 > -- > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/03/28804.php