[OMPI users] Open MPI trunk

2016-03-23 Thread Husen R
Dear all,

What is Open MPI trunk ?
if there is Open MPI functionality that resides in Open MPI Trunk, what
does it mean ?
is it possible to use any Open MPI functionality that resides in Open MPI
Trunk ?

I want to use ompi-migrate command. According to Open MPI archive, it is
resides in open MPI trunk.
I need help, Thank you,


Regards,


Husen


Re: [OMPI users] Open MPI trunk

2016-03-23 Thread Gilles Gouaillardet
Husen,

trunk is an old term coming from SVN.

now you should read Open MPI master, e.g. the "master" branch from
https://github.com/open-mpi/ompi.git

(vs the v2.x or v1.10 branch of https://github.com/open-mpi/ompi-release.git)

Cheers,

Gilles

On Wed, Mar 23, 2016 at 3:13 PM, Husen R  wrote:
> Dear all,
>
> What is Open MPI trunk ?
> if there is Open MPI functionality that resides in Open MPI Trunk, what does
> it mean ?
> is it possible to use any Open MPI functionality that resides in Open MPI
> Trunk ?
>
> I want to use ompi-migrate command. According to Open MPI archive, it is
> resides in open MPI trunk.
> I need help, Thank you,
>
>
> Regards,
>
>
> Husen
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/03/28789.php


[OMPI users] terrible infiniband performance for

2016-03-23 Thread Ronald Cohen
I get 100 GFLOPS for 16 cores on one node, but 1 GFLOP running 8 cores
on two nodes. It seems that quad-infiniband should do better than
this. I built openmpi-1.10.2g with gcc version 6.0.0 20160317 . Any
ideas of what to do to get usable performance? Thank you!

bstatus
Infiniband device 'mlx4_0' port 1 status:
default gid: fe80::::0002:c903:00ec:9301
base lid:0x1
sm lid:  0x1
state:   4: ACTIVE
phys state:  5: LinkUp
rate:56 Gb/sec (4X FDR)
link_layer:  InfiniBand

Ron
--

Professor Dr. Ronald Cohen
Ludwig Maximilians Universität
Theresienstrasse 41 Room 207
Department für Geo- und Umweltwissenschaften
München
80333
Deutschland


ronald.co...@min.uni-muenchen.de
skype: ronaldcohen
+49 (0) 89 74567980
---
Ronald Cohen
Geophysical Laboratory
Carnegie Institution
5251 Broad Branch Rd., N.W.
Washington, D.C. 20015
rco...@carnegiescience.edu
office: 202-478-8937
skype: ronaldcohen
https://twitter.com/recohen3
https://www.linkedin.com/profile/view?id=163327727


---
Ron Cohen
recoh...@gmail.com
skypename: ronaldcohen
twitter: @recohen3


Re: [OMPI users] terrible infiniband performance for HPL, & gfortran message

2016-03-23 Thread Ronald Cohen
Attached is the output of ompi_info --all .

Note that the message :
Fort use mpi_f08: yes
 Fort mpi_f08 compliance: The mpi_f08 module is available, but due to
limitations in the gfortran compiler, does not support the following:
array subsections, direct passthru (where possible) to underlying Open
MPI's C functionality
is not correct anymore--gfortran 6.0.0 now includes array subsections
Not sure about direct passthru.

Ron
---
Ron Cohen
recoh...@gmail.com
skypename: ronaldcohen
twitter: @recohen3


On Wed, Mar 23, 2016 at 7:54 AM, Ronald Cohen  wrote:
> I get 100 GFLOPS for 16 cores on one node, but 1 GFLOP running 8 cores
> on two nodes. It seems that quad-infiniband should do better than
> this. I built openmpi-1.10.2g with gcc version 6.0.0 20160317 . Any
> ideas of what to do to get usable performance? Thank you!
>
> bstatus
> Infiniband device 'mlx4_0' port 1 status:
> default gid: fe80::::0002:c903:00ec:9301
> base lid:0x1
> sm lid:  0x1
> state:   4: ACTIVE
> phys state:  5: LinkUp
> rate:56 Gb/sec (4X FDR)
> link_layer:  InfiniBand
>
> Ron
> --
>
> Professor Dr. Ronald Cohen
> Ludwig Maximilians Universität
> Theresienstrasse 41 Room 207
> Department für Geo- und Umweltwissenschaften
> München
> 80333
> Deutschland
>
>
> ronald.co...@min.uni-muenchen.de
> skype: ronaldcohen
> +49 (0) 89 74567980
> ---
> Ronald Cohen
> Geophysical Laboratory
> Carnegie Institution
> 5251 Broad Branch Rd., N.W.
> Washington, D.C. 20015
> rco...@carnegiescience.edu
> office: 202-478-8937
> skype: ronaldcohen
> https://twitter.com/recohen3
> https://www.linkedin.com/profile/view?id=163327727
>
>
> ---
> Ron Cohen
> recoh...@gmail.com
> skypename: ronaldcohen
> twitter: @recohen3


ompi_info.out
Description: Binary data


Re: [OMPI users] terrible infiniband performance for

2016-03-23 Thread Gilles Gouaillardet
Ronald,

did you try to build openmpi with a previous gcc release ?
if yes, what about the performance ?

did you build openmpi from a tarball or from git ?
if from git and without VPATH, then you need to
configure with --disable-debug

iirc, one issue was identified previously
(gcc optimization that prevents the memory wrapper from behaving as
expected) and I am not sure the fix landed in v1.10 branch nor master ...

thanks for the info about gcc 6.0.0
now this is supported on a free compiler
(cray and intel already support that, but they are commercial compilers),
I will resume my work on supporting this

Cheers,

Gilles

On Wednesday, March 23, 2016, Ronald Cohen  wrote:

> I get 100 GFLOPS for 16 cores on one node, but 1 GFLOP running 8 cores
> on two nodes. It seems that quad-infiniband should do better than
> this. I built openmpi-1.10.2g with gcc version 6.0.0 20160317 . Any
> ideas of what to do to get usable performance? Thank you!
>
> bstatus
> Infiniband device 'mlx4_0' port 1 status:
> default gid: fe80::::0002:c903:00ec:9301
> base lid:0x1
> sm lid:  0x1
> state:   4: ACTIVE
> phys state:  5: LinkUp
> rate:56 Gb/sec (4X FDR)
> link_layer:  InfiniBand
>
> Ron
> --
>
> Professor Dr. Ronald Cohen
> Ludwig Maximilians Universität
> Theresienstrasse 41 Room 207
> Department für Geo- und Umweltwissenschaften
> München
> 80333
> Deutschland
>
>
> ronald.co...@min.uni-muenchen.de 
> skype: ronaldcohen
> +49 (0) 89 74567980
> ---
> Ronald Cohen
> Geophysical Laboratory
> Carnegie Institution
> 5251 Broad Branch Rd., N.W.
> Washington, D.C. 20015
> rco...@carnegiescience.edu 
> office: 202-478-8937
> skype: ronaldcohen
> https://twitter.com/recohen3
> https://www.linkedin.com/profile/view?id=163327727
>
>
> ---
> Ron Cohen
> recoh...@gmail.com 
> skypename: ronaldcohen
> twitter: @recohen3
> ___
> users mailing list
> us...@open-mpi.org 
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/03/28791.php


Re: [OMPI users] terrible infiniband performance for

2016-03-23 Thread Ronald Cohen
Thank  you! Here are the answers:

I did not try a previous release of gcc.
I built from a tarball.
What should I do about the iirc issue--how should I check?
Are there any flags I should be using for infiniband? Is this a
problem with latency?

Ron


---
Ron Cohen
recoh...@gmail.com
skypename: ronaldcohen
twitter: @recohen3


On Wed, Mar 23, 2016 at 8:13 AM, Gilles Gouaillardet
 wrote:
> Ronald,
>
> did you try to build openmpi with a previous gcc release ?
> if yes, what about the performance ?
>
> did you build openmpi from a tarball or from git ?
> if from git and without VPATH, then you need to
> configure with --disable-debug
>
> iirc, one issue was identified previously
> (gcc optimization that prevents the memory wrapper from behaving as
> expected) and I am not sure the fix landed in v1.10 branch nor master ...
>
> thanks for the info about gcc 6.0.0
> now this is supported on a free compiler
> (cray and intel already support that, but they are commercial compilers),
> I will resume my work on supporting this
>
> Cheers,
>
> Gilles
>
> On Wednesday, March 23, 2016, Ronald Cohen  wrote:
>>
>> I get 100 GFLOPS for 16 cores on one node, but 1 GFLOP running 8 cores
>> on two nodes. It seems that quad-infiniband should do better than
>> this. I built openmpi-1.10.2g with gcc version 6.0.0 20160317 . Any
>> ideas of what to do to get usable performance? Thank you!
>>
>> bstatus
>> Infiniband device 'mlx4_0' port 1 status:
>> default gid: fe80::::0002:c903:00ec:9301
>> base lid:0x1
>> sm lid:  0x1
>> state:   4: ACTIVE
>> phys state:  5: LinkUp
>> rate:56 Gb/sec (4X FDR)
>> link_layer:  InfiniBand
>>
>> Ron
>> --
>>
>> Professor Dr. Ronald Cohen
>> Ludwig Maximilians Universität
>> Theresienstrasse 41 Room 207
>> Department für Geo- und Umweltwissenschaften
>> München
>> 80333
>> Deutschland
>>
>>
>> ronald.co...@min.uni-muenchen.de
>> skype: ronaldcohen
>> +49 (0) 89 74567980
>> ---
>> Ronald Cohen
>> Geophysical Laboratory
>> Carnegie Institution
>> 5251 Broad Branch Rd., N.W.
>> Washington, D.C. 20015
>> rco...@carnegiescience.edu
>> office: 202-478-8937
>> skype: ronaldcohen
>> https://twitter.com/recohen3
>> https://www.linkedin.com/profile/view?id=163327727
>>
>>
>> ---
>> Ron Cohen
>> recoh...@gmail.com
>> skypename: ronaldcohen
>> twitter: @recohen3
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2016/03/28791.php
>
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/03/28793.php


Re: [OMPI users] terrible infiniband performance for

2016-03-23 Thread Joshua Ladd
Hi, Ron

Please include the command line you used in your tests. Have you run any
sanity checks, like OSU latency and bandwidth benchmarks between the nodes?

Josh

On Wed, Mar 23, 2016 at 8:47 AM, Ronald Cohen  wrote:

> Thank  you! Here are the answers:
>
> I did not try a previous release of gcc.
> I built from a tarball.
> What should I do about the iirc issue--how should I check?
> Are there any flags I should be using for infiniband? Is this a
> problem with latency?
>
> Ron
>
>
> ---
> Ron Cohen
> recoh...@gmail.com
> skypename: ronaldcohen
> twitter: @recohen3
>
>
> On Wed, Mar 23, 2016 at 8:13 AM, Gilles Gouaillardet
>  wrote:
> > Ronald,
> >
> > did you try to build openmpi with a previous gcc release ?
> > if yes, what about the performance ?
> >
> > did you build openmpi from a tarball or from git ?
> > if from git and without VPATH, then you need to
> > configure with --disable-debug
> >
> > iirc, one issue was identified previously
> > (gcc optimization that prevents the memory wrapper from behaving as
> > expected) and I am not sure the fix landed in v1.10 branch nor master ...
> >
> > thanks for the info about gcc 6.0.0
> > now this is supported on a free compiler
> > (cray and intel already support that, but they are commercial compilers),
> > I will resume my work on supporting this
> >
> > Cheers,
> >
> > Gilles
> >
> > On Wednesday, March 23, 2016, Ronald Cohen  wrote:
> >>
> >> I get 100 GFLOPS for 16 cores on one node, but 1 GFLOP running 8 cores
> >> on two nodes. It seems that quad-infiniband should do better than
> >> this. I built openmpi-1.10.2g with gcc version 6.0.0 20160317 . Any
> >> ideas of what to do to get usable performance? Thank you!
> >>
> >> bstatus
> >> Infiniband device 'mlx4_0' port 1 status:
> >> default gid: fe80::::0002:c903:00ec:9301
> >> base lid:0x1
> >> sm lid:  0x1
> >> state:   4: ACTIVE
> >> phys state:  5: LinkUp
> >> rate:56 Gb/sec (4X FDR)
> >> link_layer:  InfiniBand
> >>
> >> Ron
> >> --
> >>
> >> Professor Dr. Ronald Cohen
> >> Ludwig Maximilians Universität
> >> Theresienstrasse 41 Room 207
> >> Department für Geo- und Umweltwissenschaften
> >> München
> >> 80333
> >> Deutschland
> >>
> >>
> >> ronald.co...@min.uni-muenchen.de
> >> skype: ronaldcohen
> >> +49 (0) 89 74567980
> >> ---
> >> Ronald Cohen
> >> Geophysical Laboratory
> >> Carnegie Institution
> >> 5251 Broad Branch Rd., N.W.
> >> Washington, D.C. 20015
> >> rco...@carnegiescience.edu
> >> office: 202-478-8937
> >> skype: ronaldcohen
> >> https://twitter.com/recohen3
> >> https://www.linkedin.com/profile/view?id=163327727
> >>
> >>
> >> ---
> >> Ron Cohen
> >> recoh...@gmail.com
> >> skypename: ronaldcohen
> >> twitter: @recohen3
> >> ___
> >> users mailing list
> >> us...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> Link to this post:
> >> http://www.open-mpi.org/community/lists/users/2016/03/28791.php
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> > http://www.open-mpi.org/community/lists/users/2016/03/28793.php
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/03/28794.php


Re: [OMPI users] terrible infiniband performance for

2016-03-23 Thread Gilles Gouaillardet
Ronald,

the fix I mentioned landed into the v1.10 branch
https://github.com/open-mpi/ompi-release/commit/c376994b81030cfa380c29d5b8f60c3e53d3df62

can you please post your configure command line ?

you can also try to
mpirun --mca btl self,vader,openib ...
to make sure your run will abort instead of falling back to tcp

then you can
mpirun ... grep Cpus_allowed_list /proc/self/status
to confirm your tasks do not end up bound to the same cores when running on
two nodes.

is your application known to scale on infiniband network ?
or did you naively hope it would scale ?

at first, I recommend you run standard benchmark to make sure you get the
performance you expect from your infiniband network
(for example IMB or OSU benchmark)
and run this test in the same environment than your app (e.g. via a batch
manager if applicable)

if you do not get the performance you expect, then I suggest you try the
stock gcc compiler shipped with your distro and see if it helps.

Cheers,

Gilles

On Wednesday, March 23, 2016, Ronald Cohen  wrote:

> Thank  you! Here are the answers:
>
> I did not try a previous release of gcc.
> I built from a tarball.
> What should I do about the iirc issue--how should I check?
> Are there any flags I should be using for infiniband? Is this a
> problem with latency?
>
> Ron
>
>
> ---
> Ron Cohen
> recoh...@gmail.com 
> skypename: ronaldcohen
> twitter: @recohen3
>
>
> On Wed, Mar 23, 2016 at 8:13 AM, Gilles Gouaillardet
> > wrote:
> > Ronald,
> >
> > did you try to build openmpi with a previous gcc release ?
> > if yes, what about the performance ?
> >
> > did you build openmpi from a tarball or from git ?
> > if from git and without VPATH, then you need to
> > configure with --disable-debug
> >
> > iirc, one issue was identified previously
> > (gcc optimization that prevents the memory wrapper from behaving as
> > expected) and I am not sure the fix landed in v1.10 branch nor master ...
> >
> > thanks for the info about gcc 6.0.0
> > now this is supported on a free compiler
> > (cray and intel already support that, but they are commercial compilers),
> > I will resume my work on supporting this
> >
> > Cheers,
> >
> > Gilles
> >
> > On Wednesday, March 23, 2016, Ronald Cohen  > wrote:
> >>
> >> I get 100 GFLOPS for 16 cores on one node, but 1 GFLOP running 8 cores
> >> on two nodes. It seems that quad-infiniband should do better than
> >> this. I built openmpi-1.10.2g with gcc version 6.0.0 20160317 . Any
> >> ideas of what to do to get usable performance? Thank you!
> >>
> >> bstatus
> >> Infiniband device 'mlx4_0' port 1 status:
> >> default gid: fe80::::0002:c903:00ec:9301
> >> base lid:0x1
> >> sm lid:  0x1
> >> state:   4: ACTIVE
> >> phys state:  5: LinkUp
> >> rate:56 Gb/sec (4X FDR)
> >> link_layer:  InfiniBand
> >>
> >> Ron
> >> --
> >>
> >> Professor Dr. Ronald Cohen
> >> Ludwig Maximilians Universität
> >> Theresienstrasse 41 Room 207
> >> Department für Geo- und Umweltwissenschaften
> >> München
> >> 80333
> >> Deutschland
> >>
> >>
> >> ronald.co...@min.uni-muenchen.de 
> >> skype: ronaldcohen
> >> +49 (0) 89 74567980
> >> ---
> >> Ronald Cohen
> >> Geophysical Laboratory
> >> Carnegie Institution
> >> 5251 Broad Branch Rd., N.W.
> >> Washington, D.C. 20015
> >> rco...@carnegiescience.edu 
> >> office: 202-478-8937
> >> skype: ronaldcohen
> >> https://twitter.com/recohen3
> >> https://www.linkedin.com/profile/view?id=163327727
> >>
> >>
> >> ---
> >> Ron Cohen
> >> recoh...@gmail.com 
> >> skypename: ronaldcohen
> >> twitter: @recohen3
> >> ___
> >> users mailing list
> >> us...@open-mpi.org 
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> Link to this post:
> >> http://www.open-mpi.org/community/lists/users/2016/03/28791.php
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org 
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> > http://www.open-mpi.org/community/lists/users/2016/03/28793.php
> ___
> users mailing list
> us...@open-mpi.org 
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/03/28794.php


Re: [OMPI users] Strange problem with mpirun and LIGGGHTS after reboot of machine

2016-03-23 Thread Rainer Koenig
Gilles,

I managed to get snapshots of all the /proc//status entries for all
liggghts jobs, but the Cpus_allowed ist similar no matter if the system
was cold or warm booted.

Then I looked around in /proc/ and found sched_debug.

This at least shows, that the liggghts-processes are not spread over all
cores. Some cores just have on of those, some have none and some have many.

I agree that the problem that the processes are not spread over all
cores is a consequence but not the root cause. This means I now need to
find out how the kernel scheduler decides on which core a process should
run and why he can spread 48 tasks over 48 cores when I cold boot the
machine and can't when I warm boot it.

So I guess I have to proceed to the linux kernel mailing list with this
issue. Another thing that points towards the kernel is that yesterday I
installed a newer 4.4.0 kernel on the machine and the problem is still
there, but not that worse than on the 4.2 kernel.

I also tried mpirun -mca... but that didn't change anything.

Thanks for your input anyway, at least I now have a sched_debug
snapshot, maybe that is helpful in the further investigation.

Regards
Rainer

Am 22.03.2016 um 14:38 schrieb Gilles Gouaillardet:
> Rainer,
> 
> a first step could be to gather /proc/pid/status for your 48 tasks.
> then you can
> grep Cpus_allowed_list
> and see if you find something suspucious.
> 
> if your processes are idling, then the scheduler might assign them to
> the same core.
> in this case, your processes not being spread is a consequence and not a
> root cause.
> 
> just to make sure there are no strange side effects, could you
> mpirun --mca btl sm,self ...
> 
> Cheers,
> 
> Gilles
> 
> 
> On Tuesday, March 22, 2016, Rainer Koenig  > wrote:
> 
> Am 17.03.2016 um 10:40 schrieb Ralph Castain:
> > Just some thoughts offhand:
> >
> > * what version of OMPI are you using?
> 
> dpkg -l openmpi-bin says 1.6.5-8 from Ubuntu 14.04.
> >
> > * are you saying that after the warm reboot, all 48 procs are
> running on a subset of cores?
> 
> Yes. After a cold boot all 48 processses are spread over all 48 cores
> and all cores show up as almost 100% in the htop cpu meter.
> 
> After a warm boot, the 48 processes are just spread over a few cores and
> the rest of the system is idling.
> 
> > * it sounds like some of the cores have been marked as “offline”
> for some reason. Make sure you have hwloc installed on the machine,
> and run “lstopo” and see if that is the case
> 
> I tried with lstopo, but the graphics that I got look almost similar.
> The visible difference is in the sort of topology for the graphics
> adapter and the LAN cards. The path to the graphics shows 2 times the
> numbers 4,0 above the lines and the path to the eth0 shows 2 times the
> numbers 0,2 above the lines. lstopo for the warm boot looks identical,
> but those small numbers are missing now.
> 
> I also tried with hwloc-gather-topology and diff'd the 2 results. There
> is nothing special to see. Differneces in /proc/stats/ and
> /proc/cpuinfo, but nothing special, just ohter values.
> 
> Something is obviously wrong on a low level, but I'm still struggling to
> find it. :-/
> 
> Rainer
> --
> Dipl.-Inf. (FH) Rainer Koenig
> Project Manager Linux Clients
> Dept. PDG WPS R&D SW OSE
> 
> Fujitsu Technology Solutions
> Bürgermeister-Ullrich-Str. 100
> 86199 Augsburg
> Germany
> 
> Telephone: +49-821-804-3321
> Telefax:   +49-821-804-2131
> Mail:  mailto:rainer.koe...@ts.fujitsu.com 
> 
> Internet ts.fujtsu.com 
> Company Details  ts.fujitsu.com/imprint.html
> 
> ___
> users mailing list
> us...@open-mpi.org 
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/03/28787.php
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/03/28788.php
> 


-- 
Dipl.-Inf. (FH) Rainer Koenig
Project Manager Linux Clients
Dept. PDG WPS R&D SW OSE

Fujitsu Technology Solutions
Bürgermeister-Ullrich-Str. 100
86199 Augsburg
Germany

Telephone: +49-821-804-3321
Telefax:   +49-821-804-2131
Mail:  mailto:rainer.koe...@ts.fujitsu.com

Internet ts.fujtsu.com
Company Details  ts.fujitsu.com/imprint.html


Re: [OMPI users] Strange problem with mpirun and LIGGGHTS after reboot of machine

2016-03-23 Thread Gilles Gouaillardet
Rainer,

what if you explicitly bind tasks to cores ?

mpirun -bind-to core ...

note this is v1.8 syntax ...
v1.6 is now obsolete (Debian folks are working on upgrading it...)

out of curiosity, did you try an other distro such as redhat and the likes,
suse ...
and do you observe the same behavior ?

and btw, what does /proc/self/status say ?
bound to cores ? socket ? no binding at all ?

Cheers,

Gilles

On Wednesday, March 23, 2016, Rainer Koenig 
wrote:

> Gilles,
>
> I managed to get snapshots of all the /proc//status entries for all
> liggghts jobs, but the Cpus_allowed ist similar no matter if the system
> was cold or warm booted.
>
> Then I looked around in /proc/ and found sched_debug.
>
> This at least shows, that the liggghts-processes are not spread over all
> cores. Some cores just have on of those, some have none and some have many.
>
> I agree that the problem that the processes are not spread over all
> cores is a consequence but not the root cause. This means I now need to
> find out how the kernel scheduler decides on which core a process should
> run and why he can spread 48 tasks over 48 cores when I cold boot the
> machine and can't when I warm boot it.
>
> So I guess I have to proceed to the linux kernel mailing list with this
> issue. Another thing that points towards the kernel is that yesterday I
> installed a newer 4.4.0 kernel on the machine and the problem is still
> there, but not that worse than on the 4.2 kernel.
>
> I also tried mpirun -mca... but that didn't change anything.
>
> Thanks for your input anyway, at least I now have a sched_debug
> snapshot, maybe that is helpful in the further investigation.
>
> Regards
> Rainer
>
> Am 22.03.2016 um 14:38 schrieb Gilles Gouaillardet:
> > Rainer,
> >
> > a first step could be to gather /proc/pid/status for your 48 tasks.
> > then you can
> > grep Cpus_allowed_list
> > and see if you find something suspucious.
> >
> > if your processes are idling, then the scheduler might assign them to
> > the same core.
> > in this case, your processes not being spread is a consequence and not a
> > root cause.
> >
> > just to make sure there are no strange side effects, could you
> > mpirun --mca btl sm,self ...
> >
> > Cheers,
> >
> > Gilles
> >
> >
> > On Tuesday, March 22, 2016, Rainer Koenig  
> > > wrote:
> >
> > Am 17.03.2016 um 10:40 schrieb Ralph Castain:
> > > Just some thoughts offhand:
> > >
> > > * what version of OMPI are you using?
> >
> > dpkg -l openmpi-bin says 1.6.5-8 from Ubuntu 14.04.
> > >
> > > * are you saying that after the warm reboot, all 48 procs are
> > running on a subset of cores?
> >
> > Yes. After a cold boot all 48 processses are spread over all 48 cores
> > and all cores show up as almost 100% in the htop cpu meter.
> >
> > After a warm boot, the 48 processes are just spread over a few cores
> and
> > the rest of the system is idling.
> >
> > > * it sounds like some of the cores have been marked as “offline”
> > for some reason. Make sure you have hwloc installed on the machine,
> > and run “lstopo” and see if that is the case
> >
> > I tried with lstopo, but the graphics that I got look almost similar.
> > The visible difference is in the sort of topology for the graphics
> > adapter and the LAN cards. The path to the graphics shows 2 times the
> > numbers 4,0 above the lines and the path to the eth0 shows 2 times
> the
> > numbers 0,2 above the lines. lstopo for the warm boot looks
> identical,
> > but those small numbers are missing now.
> >
> > I also tried with hwloc-gather-topology and diff'd the 2 results.
> There
> > is nothing special to see. Differneces in /proc/stats/ and
> > /proc/cpuinfo, but nothing special, just ohter values.
> >
> > Something is obviously wrong on a low level, but I'm still
> struggling to
> > find it. :-/
> >
> > Rainer
> > --
> > Dipl.-Inf. (FH) Rainer Koenig
> > Project Manager Linux Clients
> > Dept. PDG WPS R&D SW OSE
> >
> > Fujitsu Technology Solutions
> > Bürgermeister-Ullrich-Str. 100
> > 86199 Augsburg
> > Germany
> >
> > Telephone: +49-821-804-3321
> > Telefax:   +49-821-804-2131
> > Mail:  mailto:rainer.koe...@ts.fujitsu.com 
> 
> >
> > Internet ts.fujtsu.com 
> > Company Details  ts.fujitsu.com/imprint.html
> > 
> > ___
> > users mailing list
> > us...@open-mpi.org  
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> > http://www.open-mpi.org/community/lists/users/2016/03/28787.php
> >
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org 
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> http:/

Re: [OMPI users] terrible infiniband performance for

2016-03-23 Thread Ronald Cohen
I have tried:

mpirun --mca btl openib,self -hostfile $PBS_NODEFILE -n 16  xhpl  > xhpl.out

and

mpirun -hostfile $PBS_NODEFILE -n 16  xhpl  > xhpl.out

How do I run "sanity checks, like OSU latency and bandwidth benchmarks
between the nodes?" I am not superuser. Thanks,

Ron

---
Ron Cohen
recoh...@gmail.com
skypename: ronaldcohen
twitter: @recohen3


On Wed, Mar 23, 2016 at 9:28 AM, Joshua Ladd  wrote:
> Hi, Ron
>
> Please include the command line you used in your tests. Have you run any
> sanity checks, like OSU latency and bandwidth benchmarks between the nodes?
>
> Josh
>
> On Wed, Mar 23, 2016 at 8:47 AM, Ronald Cohen  wrote:
>>
>> Thank  you! Here are the answers:
>>
>> I did not try a previous release of gcc.
>> I built from a tarball.
>> What should I do about the iirc issue--how should I check?
>> Are there any flags I should be using for infiniband? Is this a
>> problem with latency?
>>
>> Ron
>>
>>
>> ---
>> Ron Cohen
>> recoh...@gmail.com
>> skypename: ronaldcohen
>> twitter: @recohen3
>>
>>
>> On Wed, Mar 23, 2016 at 8:13 AM, Gilles Gouaillardet
>>  wrote:
>> > Ronald,
>> >
>> > did you try to build openmpi with a previous gcc release ?
>> > if yes, what about the performance ?
>> >
>> > did you build openmpi from a tarball or from git ?
>> > if from git and without VPATH, then you need to
>> > configure with --disable-debug
>> >
>> > iirc, one issue was identified previously
>> > (gcc optimization that prevents the memory wrapper from behaving as
>> > expected) and I am not sure the fix landed in v1.10 branch nor master
>> > ...
>> >
>> > thanks for the info about gcc 6.0.0
>> > now this is supported on a free compiler
>> > (cray and intel already support that, but they are commercial
>> > compilers),
>> > I will resume my work on supporting this
>> >
>> > Cheers,
>> >
>> > Gilles
>> >
>> > On Wednesday, March 23, 2016, Ronald Cohen  wrote:
>> >>
>> >> I get 100 GFLOPS for 16 cores on one node, but 1 GFLOP running 8 cores
>> >> on two nodes. It seems that quad-infiniband should do better than
>> >> this. I built openmpi-1.10.2g with gcc version 6.0.0 20160317 . Any
>> >> ideas of what to do to get usable performance? Thank you!
>> >>
>> >> bstatus
>> >> Infiniband device 'mlx4_0' port 1 status:
>> >> default gid: fe80::::0002:c903:00ec:9301
>> >> base lid:0x1
>> >> sm lid:  0x1
>> >> state:   4: ACTIVE
>> >> phys state:  5: LinkUp
>> >> rate:56 Gb/sec (4X FDR)
>> >> link_layer:  InfiniBand
>> >>
>> >> Ron
>> >> --
>> >>
>> >> Professor Dr. Ronald Cohen
>> >> Ludwig Maximilians Universität
>> >> Theresienstrasse 41 Room 207
>> >> Department für Geo- und Umweltwissenschaften
>> >> München
>> >> 80333
>> >> Deutschland
>> >>
>> >>
>> >> ronald.co...@min.uni-muenchen.de
>> >> skype: ronaldcohen
>> >> +49 (0) 89 74567980
>> >> ---
>> >> Ronald Cohen
>> >> Geophysical Laboratory
>> >> Carnegie Institution
>> >> 5251 Broad Branch Rd., N.W.
>> >> Washington, D.C. 20015
>> >> rco...@carnegiescience.edu
>> >> office: 202-478-8937
>> >> skype: ronaldcohen
>> >> https://twitter.com/recohen3
>> >> https://www.linkedin.com/profile/view?id=163327727
>> >>
>> >>
>> >> ---
>> >> Ron Cohen
>> >> recoh...@gmail.com
>> >> skypename: ronaldcohen
>> >> twitter: @recohen3
>> >> ___
>> >> users mailing list
>> >> us...@open-mpi.org
>> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >> Link to this post:
>> >> http://www.open-mpi.org/community/lists/users/2016/03/28791.php
>> >
>> >
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > Link to this post:
>> > http://www.open-mpi.org/community/lists/users/2016/03/28793.php
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2016/03/28794.php
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/03/28795.php


Re: [OMPI users] terrible infiniband performance for

2016-03-23 Thread Ronald Cohen
The configure line was simply:

 ./configure --prefix=/home/rcohen

when I run:

mpirun --mca btl self,vader,openib ...

I get the same lousy results: 1.5 GFLOPS

The output of the grep is:

Cpus_allowed_list:  0-7
Cpus_allowed_list:  8-15
Cpus_allowed_list:  0-7
Cpus_allowed_list:  8-15
Cpus_allowed_list:  0-7
Cpus_allowed_list:  8-15
Cpus_allowed_list:  0-7
Cpus_allowed_list:  8-15
Cpus_allowed_list:  0-7
Cpus_allowed_list:  8-15
Cpus_allowed_list:  0-7
Cpus_allowed_list:  8-15
Cpus_allowed_list:  0-7
Cpus_allowed_list:  8-15
Cpus_allowed_list:  0-7
Cpus_allowed_list:  8-15


linpack *HPL) certainly is known to scale fine.

I am running a standard benchmark--HPL--linpack.

I think it is not the compiler, but I could try that.

Ron




---
Ron Cohen
recoh...@gmail.com
skypename: ronaldcohen
twitter: @recohen3


On Wed, Mar 23, 2016 at 9:32 AM, Gilles Gouaillardet
 wrote:
> Ronald,
>
> the fix I mentioned landed into the v1.10 branch
> https://github.com/open-mpi/ompi-release/commit/c376994b81030cfa380c29d5b8f60c3e53d3df62
>
> can you please post your configure command line ?
>
> you can also try to
> mpirun --mca btl self,vader,openib ...
> to make sure your run will abort instead of falling back to tcp
>
> then you can
> mpirun ... grep Cpus_allowed_list /proc/self/status
> to confirm your tasks do not end up bound to the same cores when running on
> two nodes.
>
> is your application known to scale on infiniband network ?
> or did you naively hope it would scale ?
>
> at first, I recommend you run standard benchmark to make sure you get the
> performance you expect from your infiniband network
> (for example IMB or OSU benchmark)
> and run this test in the same environment than your app (e.g. via a batch
> manager if applicable)
>
> if you do not get the performance you expect, then I suggest you try the
> stock gcc compiler shipped with your distro and see if it helps.
>
> Cheers,
>
> Gilles
>
> On Wednesday, March 23, 2016, Ronald Cohen  wrote:
>>
>> Thank  you! Here are the answers:
>>
>> I did not try a previous release of gcc.
>> I built from a tarball.
>> What should I do about the iirc issue--how should I check?
>> Are there any flags I should be using for infiniband? Is this a
>> problem with latency?
>>
>> Ron
>>
>>
>> ---
>> Ron Cohen
>> recoh...@gmail.com
>> skypename: ronaldcohen
>> twitter: @recohen3
>>
>>
>> On Wed, Mar 23, 2016 at 8:13 AM, Gilles Gouaillardet
>>  wrote:
>> > Ronald,
>> >
>> > did you try to build openmpi with a previous gcc release ?
>> > if yes, what about the performance ?
>> >
>> > did you build openmpi from a tarball or from git ?
>> > if from git and without VPATH, then you need to
>> > configure with --disable-debug
>> >
>> > iirc, one issue was identified previously
>> > (gcc optimization that prevents the memory wrapper from behaving as
>> > expected) and I am not sure the fix landed in v1.10 branch nor master
>> > ...
>> >
>> > thanks for the info about gcc 6.0.0
>> > now this is supported on a free compiler
>> > (cray and intel already support that, but they are commercial
>> > compilers),
>> > I will resume my work on supporting this
>> >
>> > Cheers,
>> >
>> > Gilles
>> >
>> > On Wednesday, March 23, 2016, Ronald Cohen  wrote:
>> >>
>> >> I get 100 GFLOPS for 16 cores on one node, but 1 GFLOP running 8 cores
>> >> on two nodes. It seems that quad-infiniband should do better than
>> >> this. I built openmpi-1.10.2g with gcc version 6.0.0 20160317 . Any
>> >> ideas of what to do to get usable performance? Thank you!
>> >>
>> >> bstatus
>> >> Infiniband device 'mlx4_0' port 1 status:
>> >> default gid: fe80::::0002:c903:00ec:9301
>> >> base lid:0x1
>> >> sm lid:  0x1
>> >> state:   4: ACTIVE
>> >> phys state:  5: LinkUp
>> >> rate:56 Gb/sec (4X FDR)
>> >> link_layer:  InfiniBand
>> >>
>> >> Ron
>> >> --
>> >>
>> >> Professor Dr. Ronald Cohen
>> >> Ludwig Maximilians Universität
>> >> Theresienstrasse 41 Room 207
>> >> Department für Geo- und Umweltwissenschaften
>> >> München
>> >> 80333
>> >> Deutschland
>> >>
>> >>
>> >> ronald.co...@min.uni-muenchen.de
>> >> skype: ronaldcohen
>> >> +49 (0) 89 74567980
>> >> ---
>> >> Ronald Cohen
>> >> Geophysical Laboratory
>> >> Carnegie Institution
>> >> 5251 Broad Branch Rd., N.W.
>> >> Washington, D.C. 20015
>> >> rco...@carnegiescience.edu
>> >> office: 202-478-8937
>> >> skype: ronaldcohen
>> >> https://twitter.com/recohen3
>> >> https://www.linkedin.com/profile/view?id=163327727
>> >>
>> >>
>> >> ---
>> >> Ron Cohen
>> >> recoh...@gmail.com
>> >> skypename: ronaldcohen
>> >> twitter: @recohen3
>> >> ___
>> >> users mailing list
>> >> us...@open-mpi.org
>> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >> Link to this post:
>> >> http://www.open-mpi.org/community/lists/users/

[OMPI users] terrible infiniband performance for

2016-03-23 Thread Gilles Gouaillardet
Ronald,

first, can you make sure tm was built ?
the easiest way us to
configure --with-tm ...
it will crash if tm is not found
if pbs/torque is not installed in a standard location, then you have to
configure --with-tm=

then you can omit -hostfile from your mpirun command line

hpl is known to scale, assuming the data is big enough, you use an
optimized blas, and the right number of openmp threads
(e.g. if you run 8 tasks per node, the you can have up to 2 openmp threads,
but if you use 8 or 16 threads, then performance will be worst)
first run xhpl one node, and when you get 80% of the peak performance, then
you can run on two nodes.

Cheers,

Gilles

On Wednesday, March 23, 2016, Ronald Cohen > wrote:

> The configure line was simply:
>
>  ./configure --prefix=/home/rcohen
>
> when I run:
>
> mpirun --mca btl self,vader,openib ...
>
> I get the same lousy results: 1.5 GFLOPS
>
> The output of the grep is:
>
> Cpus_allowed_list:  0-7
> Cpus_allowed_list:  8-15
> Cpus_allowed_list:  0-7
> Cpus_allowed_list:  8-15
> Cpus_allowed_list:  0-7
> Cpus_allowed_list:  8-15
> Cpus_allowed_list:  0-7
> Cpus_allowed_list:  8-15
> Cpus_allowed_list:  0-7
> Cpus_allowed_list:  8-15
> Cpus_allowed_list:  0-7
> Cpus_allowed_list:  8-15
> Cpus_allowed_list:  0-7
> Cpus_allowed_list:  8-15
> Cpus_allowed_list:  0-7
> Cpus_allowed_list:  8-15
>
>
> linpack *HPL) certainly is known to scale fine.
>
> I am running a standard benchmark--HPL--linpack.
>
> I think it is not the compiler, but I could try that.
>
> Ron
>
>
>
>
> ---
> Ron Cohen
> recoh...@gmail.com
> skypename: ronaldcohen
> twitter: @recohen3
>
>
> On Wed, Mar 23, 2016 at 9:32 AM, Gilles Gouaillardet
>  wrote:
> > Ronald,
> >
> > the fix I mentioned landed into the v1.10 branch
> >
> https://github.com/open-mpi/ompi-release/commit/c376994b81030cfa380c29d5b8f60c3e53d3df62
> >
> > can you please post your configure command line ?
> >
> > you can also try to
> > mpirun --mca btl self,vader,openib ...
> > to make sure your run will abort instead of falling back to tcp
> >
> > then you can
> > mpirun ... grep Cpus_allowed_list /proc/self/status
> > to confirm your tasks do not end up bound to the same cores when running
> on
> > two nodes.
> >
> > is your application known to scale on infiniband network ?
> > or did you naively hope it would scale ?
> >
> > at first, I recommend you run standard benchmark to make sure you get the
> > performance you expect from your infiniband network
> > (for example IMB or OSU benchmark)
> > and run this test in the same environment than your app (e.g. via a batch
> > manager if applicable)
> >
> > if you do not get the performance you expect, then I suggest you try the
> > stock gcc compiler shipped with your distro and see if it helps.
> >
> > Cheers,
> >
> > Gilles
> >
> > On Wednesday, March 23, 2016, Ronald Cohen  wrote:
> >>
> >> Thank  you! Here are the answers:
> >>
> >> I did not try a previous release of gcc.
> >> I built from a tarball.
> >> What should I do about the iirc issue--how should I check?
> >> Are there any flags I should be using for infiniband? Is this a
> >> problem with latency?
> >>
> >> Ron
> >>
> >>
> >> ---
> >> Ron Cohen
> >> recoh...@gmail.com
> >> skypename: ronaldcohen
> >> twitter: @recohen3
> >>
> >>
> >> On Wed, Mar 23, 2016 at 8:13 AM, Gilles Gouaillardet
> >>  wrote:
> >> > Ronald,
> >> >
> >> > did you try to build openmpi with a previous gcc release ?
> >> > if yes, what about the performance ?
> >> >
> >> > did you build openmpi from a tarball or from git ?
> >> > if from git and without VPATH, then you need to
> >> > configure with --disable-debug
> >> >
> >> > iirc, one issue was identified previously
> >> > (gcc optimization that prevents the memory wrapper from behaving as
> >> > expected) and I am not sure the fix landed in v1.10 branch nor master
> >> > ...
> >> >
> >> > thanks for the info about gcc 6.0.0
> >> > now this is supported on a free compiler
> >> > (cray and intel already support that, but they are commercial
> >> > compilers),
> >> > I will resume my work on supporting this
> >> >
> >> > Cheers,
> >> >
> >> > Gilles
> >> >
> >> > On Wednesday, March 23, 2016, Ronald Cohen 
> wrote:
> >> >>
> >> >> I get 100 GFLOPS for 16 cores on one node, but 1 GFLOP running 8
> cores
> >> >> on two nodes. It seems that quad-infiniband should do better than
> >> >> this. I built openmpi-1.10.2g with gcc version 6.0.0 20160317 . Any
> >> >> ideas of what to do to get usable performance? Thank you!
> >> >>
> >> >> bstatus
> >> >> Infiniband device 'mlx4_0' port 1 status:
> >> >> default gid: fe80::::0002:c903:00ec:9301
> >> >> base lid:0x1
> >> >> sm lid:  0x1
> >> >> state:   4: ACTIVE
> >> >> phys state:  5: LinkUp
> >> >> rate:56 Gb/sec (4X FDR)
> >> >> link_layer:  InfiniBand
> >> >>
> >> >> Ron
> >> >> --
>

Re: [OMPI users] terrible infiniband performance for

2016-03-23 Thread Ronald Cohen
Dear Gilles,

--with-tm fails. I have now built with
./configure --prefix=/home/rcohen --with-tm=/opt/torque
make clean
make -j 8
make install

This rebuilt greatly improved performance, from 1 GF to 32 GF for 2
nodes for a 2000 size matrix.  For 5000 it went up to 108. So this
sounds pretty good.

Thank you so much! Is there a way to test and improve for latency?

Thanks!

Ron

---
Ron Cohen
recoh...@gmail.com
skypename: ronaldcohen
twitter: @recohen3


On Wed, Mar 23, 2016 at 10:38 AM, Gilles Gouaillardet
 wrote:
> Ronald,
>
> first, can you make sure tm was built ?
> the easiest way us to
> configure --with-tm ...
> it will crash if tm is not found
> if pbs/torque is not installed in a standard location, then you have to
> configure --with-tm=
>
> then you can omit -hostfile from your mpirun command line
>
> hpl is known to scale, assuming the data is big enough, you use an optimized
> blas, and the right number of openmp threads
> (e.g. if you run 8 tasks per node, the you can have up to 2 openmp threads,
> but if you use 8 or 16 threads, then performance will be worst)
> first run xhpl one node, and when you get 80% of the peak performance, then
> you can run on two nodes.
>
> Cheers,
>
> Gilles
>
> On Wednesday, March 23, 2016, Ronald Cohen  wrote:
>>
>> The configure line was simply:
>>
>>  ./configure --prefix=/home/rcohen
>>
>> when I run:
>>
>> mpirun --mca btl self,vader,openib ...
>>
>> I get the same lousy results: 1.5 GFLOPS
>>
>> The output of the grep is:
>>
>> Cpus_allowed_list:  0-7
>> Cpus_allowed_list:  8-15
>> Cpus_allowed_list:  0-7
>> Cpus_allowed_list:  8-15
>> Cpus_allowed_list:  0-7
>> Cpus_allowed_list:  8-15
>> Cpus_allowed_list:  0-7
>> Cpus_allowed_list:  8-15
>> Cpus_allowed_list:  0-7
>> Cpus_allowed_list:  8-15
>> Cpus_allowed_list:  0-7
>> Cpus_allowed_list:  8-15
>> Cpus_allowed_list:  0-7
>> Cpus_allowed_list:  8-15
>> Cpus_allowed_list:  0-7
>> Cpus_allowed_list:  8-15
>>
>>
>> linpack *HPL) certainly is known to scale fine.
>>
>> I am running a standard benchmark--HPL--linpack.
>>
>> I think it is not the compiler, but I could try that.
>>
>> Ron
>>
>>
>>
>>
>> ---
>> Ron Cohen
>> recoh...@gmail.com
>> skypename: ronaldcohen
>> twitter: @recohen3
>>
>>
>> On Wed, Mar 23, 2016 at 9:32 AM, Gilles Gouaillardet
>>  wrote:
>> > Ronald,
>> >
>> > the fix I mentioned landed into the v1.10 branch
>> >
>> > https://github.com/open-mpi/ompi-release/commit/c376994b81030cfa380c29d5b8f60c3e53d3df62
>> >
>> > can you please post your configure command line ?
>> >
>> > you can also try to
>> > mpirun --mca btl self,vader,openib ...
>> > to make sure your run will abort instead of falling back to tcp
>> >
>> > then you can
>> > mpirun ... grep Cpus_allowed_list /proc/self/status
>> > to confirm your tasks do not end up bound to the same cores when running
>> > on
>> > two nodes.
>> >
>> > is your application known to scale on infiniband network ?
>> > or did you naively hope it would scale ?
>> >
>> > at first, I recommend you run standard benchmark to make sure you get
>> > the
>> > performance you expect from your infiniband network
>> > (for example IMB or OSU benchmark)
>> > and run this test in the same environment than your app (e.g. via a
>> > batch
>> > manager if applicable)
>> >
>> > if you do not get the performance you expect, then I suggest you try the
>> > stock gcc compiler shipped with your distro and see if it helps.
>> >
>> > Cheers,
>> >
>> > Gilles
>> >
>> > On Wednesday, March 23, 2016, Ronald Cohen  wrote:
>> >>
>> >> Thank  you! Here are the answers:
>> >>
>> >> I did not try a previous release of gcc.
>> >> I built from a tarball.
>> >> What should I do about the iirc issue--how should I check?
>> >> Are there any flags I should be using for infiniband? Is this a
>> >> problem with latency?
>> >>
>> >> Ron
>> >>
>> >>
>> >> ---
>> >> Ron Cohen
>> >> recoh...@gmail.com
>> >> skypename: ronaldcohen
>> >> twitter: @recohen3
>> >>
>> >>
>> >> On Wed, Mar 23, 2016 at 8:13 AM, Gilles Gouaillardet
>> >>  wrote:
>> >> > Ronald,
>> >> >
>> >> > did you try to build openmpi with a previous gcc release ?
>> >> > if yes, what about the performance ?
>> >> >
>> >> > did you build openmpi from a tarball or from git ?
>> >> > if from git and without VPATH, then you need to
>> >> > configure with --disable-debug
>> >> >
>> >> > iirc, one issue was identified previously
>> >> > (gcc optimization that prevents the memory wrapper from behaving as
>> >> > expected) and I am not sure the fix landed in v1.10 branch nor master
>> >> > ...
>> >> >
>> >> > thanks for the info about gcc 6.0.0
>> >> > now this is supported on a free compiler
>> >> > (cray and intel already support that, but they are commercial
>> >> > compilers),
>> >> > I will resume my work on supporting this
>> >> >
>> >> > Cheers,
>> >> >
>> >> > Gilles
>> >> >
>> >> > On Wednesday, March 23, 2016, Ronald Cohen 
>> >> > wrote:
>> >> >>
>> >> >> I get 100

Re: [OMPI users] terrible infiniband performance for

2016-03-23 Thread Gilles Gouaillardet
Ronald,

out of curiosity, what kind of performance do you get with tcp and two
nodes ?
e.g.
mpirun --mca tcp,vader,self ...

before that, you can
mpirun uptime
to ensure all your nodes are free
(e.g. no process was left running by an other job)

you might also want to allocate your nodes exclusively (iirc, qsub -x) to
avoid side effects

Cheers,

Gilles

On Wednesday, March 23, 2016, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

> Ronald,
>
> first, can you make sure tm was built ?
> the easiest way us to
> configure --with-tm ...
> it will crash if tm is not found
> if pbs/torque is not installed in a standard location, then you have to
> configure --with-tm=
>
> then you can omit -hostfile from your mpirun command line
>
> hpl is known to scale, assuming the data is big enough, you use an
> optimized blas, and the right number of openmp threads
> (e.g. if you run 8 tasks per node, the you can have up to 2 openmp
> threads, but if you use 8 or 16 threads, then performance will be worst)
> first run xhpl one node, and when you get 80% of the peak performance,
> then you can run on two nodes.
>
> Cheers,
>
> Gilles
>
> On Wednesday, March 23, 2016, Ronald Cohen  wrote:
>
>> The configure line was simply:
>>
>>  ./configure --prefix=/home/rcohen
>>
>> when I run:
>>
>> mpirun --mca btl self,vader,openib ...
>>
>> I get the same lousy results: 1.5 GFLOPS
>>
>> The output of the grep is:
>>
>> Cpus_allowed_list:  0-7
>> Cpus_allowed_list:  8-15
>> Cpus_allowed_list:  0-7
>> Cpus_allowed_list:  8-15
>> Cpus_allowed_list:  0-7
>> Cpus_allowed_list:  8-15
>> Cpus_allowed_list:  0-7
>> Cpus_allowed_list:  8-15
>> Cpus_allowed_list:  0-7
>> Cpus_allowed_list:  8-15
>> Cpus_allowed_list:  0-7
>> Cpus_allowed_list:  8-15
>> Cpus_allowed_list:  0-7
>> Cpus_allowed_list:  8-15
>> Cpus_allowed_list:  0-7
>> Cpus_allowed_list:  8-15
>>
>>
>> linpack *HPL) certainly is known to scale fine.
>>
>> I am running a standard benchmark--HPL--linpack.
>>
>> I think it is not the compiler, but I could try that.
>>
>> Ron
>>
>>
>>
>>
>> ---
>> Ron Cohen
>> recoh...@gmail.com
>> skypename: ronaldcohen
>> twitter: @recohen3
>>
>>
>> On Wed, Mar 23, 2016 at 9:32 AM, Gilles Gouaillardet
>>  wrote:
>> > Ronald,
>> >
>> > the fix I mentioned landed into the v1.10 branch
>> >
>> https://github.com/open-mpi/ompi-release/commit/c376994b81030cfa380c29d5b8f60c3e53d3df62
>> >
>> > can you please post your configure command line ?
>> >
>> > you can also try to
>> > mpirun --mca btl self,vader,openib ...
>> > to make sure your run will abort instead of falling back to tcp
>> >
>> > then you can
>> > mpirun ... grep Cpus_allowed_list /proc/self/status
>> > to confirm your tasks do not end up bound to the same cores when
>> running on
>> > two nodes.
>> >
>> > is your application known to scale on infiniband network ?
>> > or did you naively hope it would scale ?
>> >
>> > at first, I recommend you run standard benchmark to make sure you get
>> the
>> > performance you expect from your infiniband network
>> > (for example IMB or OSU benchmark)
>> > and run this test in the same environment than your app (e.g. via a
>> batch
>> > manager if applicable)
>> >
>> > if you do not get the performance you expect, then I suggest you try the
>> > stock gcc compiler shipped with your distro and see if it helps.
>> >
>> > Cheers,
>> >
>> > Gilles
>> >
>> > On Wednesday, March 23, 2016, Ronald Cohen  wrote:
>> >>
>> >> Thank  you! Here are the answers:
>> >>
>> >> I did not try a previous release of gcc.
>> >> I built from a tarball.
>> >> What should I do about the iirc issue--how should I check?
>> >> Are there any flags I should be using for infiniband? Is this a
>> >> problem with latency?
>> >>
>> >> Ron
>> >>
>> >>
>> >> ---
>> >> Ron Cohen
>> >> recoh...@gmail.com
>> >> skypename: ronaldcohen
>> >> twitter: @recohen3
>> >>
>> >>
>> >> On Wed, Mar 23, 2016 at 8:13 AM, Gilles Gouaillardet
>> >>  wrote:
>> >> > Ronald,
>> >> >
>> >> > did you try to build openmpi with a previous gcc release ?
>> >> > if yes, what about the performance ?
>> >> >
>> >> > did you build openmpi from a tarball or from git ?
>> >> > if from git and without VPATH, then you need to
>> >> > configure with --disable-debug
>> >> >
>> >> > iirc, one issue was identified previously
>> >> > (gcc optimization that prevents the memory wrapper from behaving as
>> >> > expected) and I am not sure the fix landed in v1.10 branch nor master
>> >> > ...
>> >> >
>> >> > thanks for the info about gcc 6.0.0
>> >> > now this is supported on a free compiler
>> >> > (cray and intel already support that, but they are commercial
>> >> > compilers),
>> >> > I will resume my work on supporting this
>> >> >
>> >> > Cheers,
>> >> >
>> >> > Gilles
>> >> >
>> >> > On Wednesday, March 23, 2016, Ronald Cohen 
>> wrote:
>> >> >>
>> >> >> I get 100 GFLOPS for 16 cores on one node, but 1 GFLOP running 8
>> cores
>> >> >> on two nodes

Re: [OMPI users] terrible infiniband performance for

2016-03-23 Thread Edgar Gabriel
not sure whether it is relevant in this case, but I spent in January 
nearly one week to figure out why the openib component was running very 
slow with the new Open MPI releases (though it was the 2.x series at 
that time), and the culprit turned out to be the
btl_openib_flags parameter. I used to set this parameter in former 
releases to get good performance on my cluster, but it lead to 
absolutely disastrous performance with the new version. So if you have 
any parameters set, try to remove them completely and see whether this 
makes a difference.


Edgar


On 3/23/2016 10:01 AM, Gilles Gouaillardet wrote:

Ronald,

out of curiosity, what kind of performance do you get with tcp and two 
nodes ?

e.g.
mpirun --mca tcp,vader,self ...

before that, you can
mpirun uptime
to ensure all your nodes are free
(e.g. no process was left running by an other job)

you might also want to allocate your nodes exclusively (iirc, qsub -x) 
to avoid side effects


Cheers,

Gilles

On Wednesday, March 23, 2016, Gilles Gouaillardet 
mailto:gilles.gouaillar...@gmail.com>> 
wrote:


Ronald,

first, can you make sure tm was built ?
the easiest way us to
configure --with-tm ...
it will crash if tm is not found
if pbs/torque is not installed in a standard location, then you
have to
configure --with-tm=

then you can omit -hostfile from your mpirun command line

hpl is known to scale, assuming the data is big enough, you use an
optimized blas, and the right number of openmp threads
(e.g. if you run 8 tasks per node, the you can have up to 2 openmp
threads, but if you use 8 or 16 threads, then performance will be
worst)
first run xhpl one node, and when you get 80% of the peak
performance, then you can run on two nodes.

Cheers,

Gilles

On Wednesday, March 23, 2016, Ronald Cohen  wrote:

The configure line was simply:

 ./configure --prefix=/home/rcohen

when I run:

mpirun --mca btl self,vader,openib ...

I get the same lousy results: 1.5 GFLOPS

The output of the grep is:

Cpus_allowed_list:  0-7
Cpus_allowed_list:  8-15
Cpus_allowed_list:  0-7
Cpus_allowed_list:  8-15
Cpus_allowed_list:  0-7
Cpus_allowed_list:  8-15
Cpus_allowed_list:  0-7
Cpus_allowed_list:  8-15
Cpus_allowed_list:  0-7
Cpus_allowed_list:  8-15
Cpus_allowed_list:  0-7
Cpus_allowed_list:  8-15
Cpus_allowed_list:  0-7
Cpus_allowed_list:  8-15
Cpus_allowed_list:  0-7
Cpus_allowed_list:  8-15


linpack *HPL) certainly is known to scale fine.

I am running a standard benchmark--HPL--linpack.

I think it is not the compiler, but I could try that.

Ron




---
Ron Cohen
recoh...@gmail.com
skypename: ronaldcohen
twitter: @recohen3


On Wed, Mar 23, 2016 at 9:32 AM, Gilles Gouaillardet
 wrote:
> Ronald,
>
> the fix I mentioned landed into the v1.10 branch
>

https://github.com/open-mpi/ompi-release/commit/c376994b81030cfa380c29d5b8f60c3e53d3df62
>
> can you please post your configure command line ?
>
> you can also try to
> mpirun --mca btl self,vader,openib ...
> to make sure your run will abort instead of falling back to tcp
>
> then you can
> mpirun ... grep Cpus_allowed_list /proc/self/status
> to confirm your tasks do not end up bound to the same cores
when running on
> two nodes.
>
> is your application known to scale on infiniband network ?
> or did you naively hope it would scale ?
>
> at first, I recommend you run standard benchmark to make
sure you get the
> performance you expect from your infiniband network
> (for example IMB or OSU benchmark)
> and run this test in the same environment than your app
(e.g. via a batch
> manager if applicable)
>
> if you do not get the performance you expect, then I suggest
you try the
> stock gcc compiler shipped with your distro and see if it helps.
>
> Cheers,
>
> Gilles
>
> On Wednesday, March 23, 2016, Ronald Cohen
 wrote:
>>
>> Thank  you! Here are the answers:
>>
>> I did not try a previous release of gcc.
>> I built from a tarball.
>> What should I do about the iirc issue--how should I check?
>> Are there any flags I should be using for infiniband? Is this a
>> problem with latency?
>>
>> Ron
>>
>>
>> ---
>> Ron Cohen
>> recoh...@gmail.com
>> skypename: ronaldcohen
>> twitter: @r

Re: [OMPI users] terrible infiniband performance for

2016-03-23 Thread Ronald Cohen
I don't have any parameters set other than the defaults--thank you!

Ron

---
Ron Cohen
recoh...@gmail.com
skypename: ronaldcohen
twitter: @recohen3


On Wed, Mar 23, 2016 at 11:07 AM, Edgar Gabriel  wrote:
> not sure whether it is relevant in this case, but I spent in January nearly
> one week to figure out why the openib component was running very slow with
> the new Open MPI releases (though it was the 2.x series at that time), and
> the culprit turned out to be the
> btl_openib_flags parameter. I used to set this parameter in former releases
> to get good performance on my cluster, but it lead to absolutely disastrous
> performance with the new version. So if you have any parameters set, try to
> remove them completely and see whether this makes a difference.
>
> Edgar
>
>
>
> On 3/23/2016 10:01 AM, Gilles Gouaillardet wrote:
>
> Ronald,
>
> out of curiosity, what kind of performance do you get with tcp and two nodes
> ?
> e.g.
> mpirun --mca tcp,vader,self ...
>
> before that, you can
> mpirun uptime
> to ensure all your nodes are free
> (e.g. no process was left running by an other job)
>
> you might also want to allocate your nodes exclusively (iirc, qsub -x) to
> avoid side effects
>
> Cheers,
>
> Gilles
>
> On Wednesday, March 23, 2016, Gilles Gouaillardet
>  wrote:
>>
>> Ronald,
>>
>> first, can you make sure tm was built ?
>> the easiest way us to
>> configure --with-tm ...
>> it will crash if tm is not found
>> if pbs/torque is not installed in a standard location, then you have to
>> configure --with-tm=
>>
>> then you can omit -hostfile from your mpirun command line
>>
>> hpl is known to scale, assuming the data is big enough, you use an
>> optimized blas, and the right number of openmp threads
>> (e.g. if you run 8 tasks per node, the you can have up to 2 openmp
>> threads, but if you use 8 or 16 threads, then performance will be worst)
>> first run xhpl one node, and when you get 80% of the peak performance,
>> then you can run on two nodes.
>>
>> Cheers,
>>
>> Gilles
>>
>> On Wednesday, March 23, 2016, Ronald Cohen  wrote:
>>>
>>> The configure line was simply:
>>>
>>>  ./configure --prefix=/home/rcohen
>>>
>>> when I run:
>>>
>>> mpirun --mca btl self,vader,openib ...
>>>
>>> I get the same lousy results: 1.5 GFLOPS
>>>
>>> The output of the grep is:
>>>
>>> Cpus_allowed_list:  0-7
>>> Cpus_allowed_list:  8-15
>>> Cpus_allowed_list:  0-7
>>> Cpus_allowed_list:  8-15
>>> Cpus_allowed_list:  0-7
>>> Cpus_allowed_list:  8-15
>>> Cpus_allowed_list:  0-7
>>> Cpus_allowed_list:  8-15
>>> Cpus_allowed_list:  0-7
>>> Cpus_allowed_list:  8-15
>>> Cpus_allowed_list:  0-7
>>> Cpus_allowed_list:  8-15
>>> Cpus_allowed_list:  0-7
>>> Cpus_allowed_list:  8-15
>>> Cpus_allowed_list:  0-7
>>> Cpus_allowed_list:  8-15
>>>
>>>
>>> linpack *HPL) certainly is known to scale fine.
>>>
>>> I am running a standard benchmark--HPL--linpack.
>>>
>>> I think it is not the compiler, but I could try that.
>>>
>>> Ron
>>>
>>>
>>>
>>>
>>> ---
>>> Ron Cohen
>>> recoh...@gmail.com
>>> skypename: ronaldcohen
>>> twitter: @recohen3
>>>
>>>
>>> On Wed, Mar 23, 2016 at 9:32 AM, Gilles Gouaillardet
>>>  wrote:
>>> > Ronald,
>>> >
>>> > the fix I mentioned landed into the v1.10 branch
>>> >
>>> > https://github.com/open-mpi/ompi-release/commit/c376994b81030cfa380c29d5b8f60c3e53d3df62
>>> >
>>> > can you please post your configure command line ?
>>> >
>>> > you can also try to
>>> > mpirun --mca btl self,vader,openib ...
>>> > to make sure your run will abort instead of falling back to tcp
>>> >
>>> > then you can
>>> > mpirun ... grep Cpus_allowed_list /proc/self/status
>>> > to confirm your tasks do not end up bound to the same cores when
>>> > running on
>>> > two nodes.
>>> >
>>> > is your application known to scale on infiniband network ?
>>> > or did you naively hope it would scale ?
>>> >
>>> > at first, I recommend you run standard benchmark to make sure you get
>>> > the
>>> > performance you expect from your infiniband network
>>> > (for example IMB or OSU benchmark)
>>> > and run this test in the same environment than your app (e.g. via a
>>> > batch
>>> > manager if applicable)
>>> >
>>> > if you do not get the performance you expect, then I suggest you try
>>> > the
>>> > stock gcc compiler shipped with your distro and see if it helps.
>>> >
>>> > Cheers,
>>> >
>>> > Gilles
>>> >
>>> > On Wednesday, March 23, 2016, Ronald Cohen  wrote:
>>> >>
>>> >> Thank  you! Here are the answers:
>>> >>
>>> >> I did not try a previous release of gcc.
>>> >> I built from a tarball.
>>> >> What should I do about the iirc issue--how should I check?
>>> >> Are there any flags I should be using for infiniband? Is this a
>>> >> problem with latency?
>>> >>
>>> >> Ron
>>> >>
>>> >>
>>> >> ---
>>> >> Ron Cohen
>>> >> recoh...@gmail.com
>>> >> skypename: ronaldcohen
>>> >> twitter: @recohen3
>>> >>
>>> >>
>>> >> On Wed, Mar 23, 2016 at 8:13 AM, Gilles Gouaillardet
>>> >>  wr

Re: [OMPI users] BLCR & openmpi

2016-03-23 Thread Meij, Henk
So I've redone this with openmpi 1.10.2 and another piece of software (lammps 
16feb16) and get same results.



Upon cr_restart I see the openlava_wrapper process, the mpirun process 
reappearing but no orted and no lmp_mpi processes. Not obvious error anywhere. 
Using the --save-all feature from BLCR and ignore pids.



Does BLCR and openmpi work? Anybody have any idea as to where to look?



-Henk




From: Meij, Henk
Sent: Monday, March 21, 2016 12:24 PM
To: us...@open-mpi.org
Subject: RE: BLCR & openmpi


hmm, I'm not correct. cr_restart starts with no errors, launches some of the 
processes, then suspends itself. strace on mpirun on this manual invocation 
yields the behavior same as below.



-Henk



[hmeij@swallowtail kflaherty]$ ps -u hmeij
  PID TTY  TIME CMD
29481 ?00:00:00 res
29485 ?00:00:00 1458575067.384
29488 ?00:00:00 1458575067.384.
29508 ?00:00:00 cr_restart
29509 ?00:00:00 blcr_watcher
29512 ?00:00:02 lava.openmpi.wr
29514 ?00:38:35 mpirun
30313 ?00:00:01 sshd
30314 pts/100:00:00 bash
30458 ?00:00:00 sleep
30483 ?00:00:00 sleep
30650 pts/100:00:00 cr_restart
30652 pts/100:00:00 lava.openmpi.wr
30653 pts/100:00:00 mpirun
30729 pts/100:00:00 ps
[hmeij@swallowtail kflaherty]$ jobs
[1]+  Stopped cr_restart --no-restore-pid --no-restore-pgid 
--no-restore-sid --relocate /sanscratch/383=/sanscratch/000 
/sanscratch/checkpoints/383/chk.28244


From: Meij, Henk
Sent: Monday, March 21, 2016 12:04 PM
To: us...@open-mpi.org
Subject: BLCR & openmpi


openmpi1.2 (yes, I know old),python 2.6.1 blcr 0.8.5



when I attempt to cr_restart (having performed cr_checkpoint --save-all) I can 
restart the job manually with blcr on a node. but when I go through my openlava 
scheduler, the cr_restart launches mpirun, then nothing. no orted or the python 
processes that were running. the new scheduler job performing the restart puts 
in place the old machinefile and stderr and stdout files. here is what I view 
on an strace of mpirun



What problem is this pointing at?

Thanks,



-Henk



poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=6, events=POLLIN}, 
{fd=11, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=9, 
events=POLLIN}, {fd=10, events=POLLIN}], 8, 1000) = 8 ([{fd=5, 
revents=POLLNVAL}, {fd=4, revents=POLLNVAL}, {fd=6, revents=POLLNVAL}, {fd=11, 
revents=POLLNVAL}, {fd=7, revents=POLLNVAL}, {fd=8, revents=POLLNVAL}, {fd=9, 
revents=POLLNVAL}, {fd=10, revents=POLLNVAL}])
rt_sigprocmask(SIG_BLOCK, [INT USR1 USR2 TERM CHLD], NULL, 8) = 0
rt_sigaction(SIGCHLD, {0x2b7ca19cb30a, [INT USR1 USR2 TERM CHLD], 
SA_RESTORER|SA_RESTART, 0x397840f790}, NULL, 8) = 0
rt_sigaction(SIGTERM, {0x2b7ca19cb30a, [INT USR1 USR2 TERM CHLD], 
SA_RESTORER|SA_RESTART, 0x397840f790}, NULL, 8) = 0
rt_sigaction(SIGINT, {0x2b7ca19cb30a, [INT USR1 USR2 TERM CHLD], 
SA_RESTORER|SA_RESTART, 0x397840f790}, NULL, 8) = 0
rt_sigaction(SIGUSR1, {0x2b7ca19cb30a, [INT USR1 USR2 TERM CHLD], 
SA_RESTORER|SA_RESTART, 0x397840f790}, NULL, 8) = 0
rt_sigaction(SIGUSR2, {0x2b7ca19cb30a, [INT USR1 USR2 TERM CHLD], 
SA_RESTORER|SA_RESTART, 0x397840f790}, NULL, 8) = 0
sched_yield()   = 0
rt_sigprocmask(SIG_BLOCK, [INT USR1 USR2 TERM CHLD], NULL, 8) = 0
rt_sigaction(SIGCHLD, {0x2b7ca19cb30a, [INT USR1 USR2 TERM CHLD], 
SA_RESTORER|SA_RESTART, 0x397840f790}, NULL, 8) = 0
rt_sigaction(SIGTERM, {0x2b7ca19cb30a, [INT USR1 USR2 TERM CHLD], 
SA_RESTORER|SA_RESTART, 0x397840f790}, NULL, 8) = 0
rt_sigaction(SIGINT, {0x2b7ca19cb30a, [INT USR1 USR2 TERM CHLD], 
SA_RESTORER|SA_RESTART, 0x397840f790}, NULL, 8) = 0
rt_sigaction(SIGUSR1, {0x2b7ca19cb30a, [INT USR1 USR2 TERM CHLD], 
SA_RESTORER|SA_RESTART, 0x397840f790}, NULL, 8) = 0
rt_sigaction(SIGUSR2, {0x2b7ca19cb30a, [INT USR1 USR2 TERM CHLD], 
SA_RESTORER|SA_RESTART, 0x397840f790}, NULL, 8) = 0






Re: [OMPI users] BLCR & openmpi

2016-03-23 Thread Ralph Castain
I don’t believe checkpoint/restart is supported in OMPI past the 1.6 series. 
There was some attempt to restore it, but that person graduated prior to fully 
completing the work.


> On Mar 23, 2016, at 9:14 AM, Meij, Henk  wrote:
> 
> So I've redone this with openmpi 1.10.2 and another piece of software (lammps 
> 16feb16) and get same results.
>  
> Upon cr_restart I see the openlava_wrapper process, the mpirun process 
> reappearing but no orted and no lmp_mpi processes. Not obvious error 
> anywhere. Using the --save-all feature from BLCR and ignore pids.
>  
> Does BLCR and openmpi work? Anybody have any idea as to where to look?
>  
> -Henk
>  
> From: Meij, Henk
> Sent: Monday, March 21, 2016 12:24 PM
> To: us...@open-mpi.org 
> Subject: RE: BLCR & openmpi
> 
> hmm, I'm not correct. cr_restart starts with no errors, launches some of the 
> processes, then suspends itself. strace on mpirun on this manual invocation 
> yields the behavior same as below.
>  
> -Henk
>  
> [hmeij@swallowtail kflaherty]$ ps -u hmeij
>   PID TTY  TIME CMD
> 29481 ?00:00:00 res
> 29485 ?00:00:00 1458575067.384
> 29488 ?00:00:00 1458575067.384.
> 29508 ?00:00:00 cr_restart
> 29509 ?00:00:00 blcr_watcher
> 29512 ?00:00:02 lava.openmpi.wr
> 29514 ?00:38:35 mpirun
> 30313 ?00:00:01 sshd
> 30314 pts/100:00:00 bash
> 30458 ?00:00:00 sleep
> 30483 ?00:00:00 sleep
> 30650 pts/100:00:00 cr_restart
> 30652 pts/100:00:00 lava.openmpi.wr
> 30653 pts/100:00:00 mpirun
> 30729 pts/100:00:00 ps
> [hmeij@swallowtail kflaherty]$ jobs
> [1]+  Stopped cr_restart --no-restore-pid --no-restore-pgid 
> --no-restore-sid --relocate /sanscratch/383=/sanscratch/000 
> /sanscratch/checkpoints/383/chk.28244
> From: Meij, Henk
> Sent: Monday, March 21, 2016 12:04 PM
> To: us...@open-mpi.org 
> Subject: BLCR & openmpi
> 
> openmpi1.2 (yes, I know old),python 2.6.1 blcr 0.8.5
>  
> when I attempt to cr_restart (having performed cr_checkpoint --save-all) I 
> can restart the job manually with blcr on a node. but when I go through my 
> openlava scheduler, the cr_restart launches mpirun, then nothing. no orted or 
> the python processes that were running. the new scheduler job performing the 
> restart puts in place the old machinefile and stderr and stdout files. here 
> is what I view on an strace of mpirun
>  
> What problem is this pointing at?
> Thanks,
>  
> -Henk
>  
> poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=6, events=POLLIN}, 
> {fd=11, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=9, 
> events=POLLIN}, {fd=10, events=POLLIN}], 8, 1000) = 8 ([{fd=5, 
> revents=POLLNVAL}, {fd=4, revents=POLLNVAL}, {fd=6, revents=POLLNVAL}, 
> {fd=11, revents=POLLNVAL}, {fd=7, revents=POLLNVAL}, {fd=8, 
> revents=POLLNVAL}, {fd=9, revents=POLLNVAL}, {fd=10, revents=POLLNVAL}])
> rt_sigprocmask(SIG_BLOCK, [INT USR1 USR2 TERM CHLD], NULL, 8) = 0
> rt_sigaction(SIGCHLD, {0x2b7ca19cb30a, [INT USR1 USR2 TERM CHLD], 
> SA_RESTORER|SA_RESTART, 0x397840f790}, NULL, 8) = 0
> rt_sigaction(SIGTERM, {0x2b7ca19cb30a, [INT USR1 USR2 TERM CHLD], 
> SA_RESTORER|SA_RESTART, 0x397840f790}, NULL, 8) = 0
> rt_sigaction(SIGINT, {0x2b7ca19cb30a, [INT USR1 USR2 TERM CHLD], 
> SA_RESTORER|SA_RESTART, 0x397840f790}, NULL, 8) = 0
> rt_sigaction(SIGUSR1, {0x2b7ca19cb30a, [INT USR1 USR2 TERM CHLD], 
> SA_RESTORER|SA_RESTART, 0x397840f790}, NULL, 8) = 0
> rt_sigaction(SIGUSR2, {0x2b7ca19cb30a, [INT USR1 USR2 TERM CHLD], 
> SA_RESTORER|SA_RESTART, 0x397840f790}, NULL, 8) = 0
> sched_yield()   = 0
> rt_sigprocmask(SIG_BLOCK, [INT USR1 USR2 TERM CHLD], NULL, 8) = 0
> rt_sigaction(SIGCHLD, {0x2b7ca19cb30a, [INT USR1 USR2 TERM CHLD], 
> SA_RESTORER|SA_RESTART, 0x397840f790}, NULL, 8) = 0
> rt_sigaction(SIGTERM, {0x2b7ca19cb30a, [INT USR1 USR2 TERM CHLD], 
> SA_RESTORER|SA_RESTART, 0x397840f790}, NULL, 8) = 0
> rt_sigaction(SIGINT, {0x2b7ca19cb30a, [INT USR1 USR2 TERM CHLD], 
> SA_RESTORER|SA_RESTART, 0x397840f790}, NULL, 8) = 0
> rt_sigaction(SIGUSR1, {0x2b7ca19cb30a, [INT USR1 USR2 TERM CHLD], 
> SA_RESTORER|SA_RESTART, 0x397840f790}, NULL, 8) = 0
> rt_sigaction(SIGUSR2, {0x2b7ca19cb30a, [INT USR1 USR2 TERM CHLD], 
> SA_RESTORER|SA_RESTART, 0x397840f790}, NULL, 8) = 0
>  
>  
> ___
> users mailing list
> us...@open-mpi.org 
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> 
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/03/28806.php 
> 


Re: [OMPI users] BLCR & openmpi

2016-03-23 Thread George Bosilca
Both BLCR and Open MPI work just fine. Independently.

Checkpointing and restarting a parallel application is not as simple as
mixing 2 tools together (especially when we talk about a communication
library, aka. MPI), they have to cooperate in order to achieve the desired
goal of being able to continue the execution on another set of resources.
Open MPI had support for C/R but this feature has been lost.

1. It is not clear from your email what exactly you checkpoint. Are you
checkpointing the mpirun process, or are you checkpointing all the MPI
processes ?

2. What are you recovering? Assuming that you checkpoint your MPI processes
(and not the mpirun), what you can try to do during the recovery is to
spawn a new set of MPI processes (that will give you new orteds) and then
let each one of these processes call the corresponding BLCR cr_restart.

3. This will not give you a working MPI environment, as the processes will
know each other from the original execution, and will be unable to connect
to each other to resume communications. You will have to dig a little more
in the code in order to achieve what you want/need.

  George.


On Wed, Mar 23, 2016 at 12:14 PM, Meij, Henk  wrote:

> So I've redone this with openmpi 1.10.2 and another piece of software
> (lammps 16feb16) and get same results.
>
>
>
> Upon cr_restart I see the openlava_wrapper process, the mpirun process
> reappearing but no orted and no lmp_mpi processes. Not obvious error
> anywhere. Using the --save-all feature from BLCR and ignore pids.
>
>
>
> Does BLCR and openmpi work? Anybody have any idea as to where to look?
>
>
>
> -Henk
>
>
> --
> *From:* Meij, Henk
> *Sent:* Monday, March 21, 2016 12:24 PM
> *To:* us...@open-mpi.org
> *Subject:* RE: BLCR & openmpi
>
> hmm, I'm not correct. cr_restart starts with no errors, launches some
> of the processes, then suspends itself. strace on mpirun on this manual
> invocation yields the behavior same as below.
>
>
>
> -Henk
>
>
>
> [hmeij@swallowtail kflaherty]$ ps -u hmeij
>   PID TTY  TIME CMD
> 29481 ?00:00:00 res
> 29485 ?00:00:00 1458575067.384
> 29488 ?00:00:00 1458575067.384.
> 29508 ?00:00:00 cr_restart
> 29509 ?00:00:00 blcr_watcher
> 29512 ?00:00:02 lava.openmpi.wr
> 29514 ?00:38:35 mpirun
> 30313 ?00:00:01 sshd
> 30314 pts/100:00:00 bash
> 30458 ?00:00:00 sleep
> 30483 ?00:00:00 sleep
> 30650 pts/100:00:00 cr_restart
> 30652 pts/100:00:00 lava.openmpi.wr
> 30653 pts/100:00:00 mpirun
> 30729 pts/100:00:00 ps
> [hmeij@swallowtail kflaherty]$ jobs
> [1]+  Stopped cr_restart --no-restore-pid
> --no-restore-pgid --no-restore-sid --relocate
> /sanscratch/383=/sanscratch/000 /sanscratch/checkpoints/383/chk.28244
> --
> *From:* Meij, Henk
> *Sent:* Monday, March 21, 2016 12:04 PM
> *To:* us...@open-mpi.org
> *Subject:* BLCR & openmpi
>
> openmpi1.2 (yes, I know old),python 2.6.1 blcr 0.8.5
>
>
>
> when I attempt to cr_restart (having performed cr_checkpoint --save-all) I
> can restart the job manually with blcr on a node. but when I go through my
> openlava scheduler, the cr_restart launches mpirun, then nothing. no orted
> or the python processes that were running. the new scheduler job performing
> the restart puts in place the old machinefile and stderr and stdout files.
> here is what I view on an strace of mpirun
>
>
>
> What problem is this pointing at?
>
> Thanks,
>
>
>
> -Henk
>
>
>
> poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=6, events=POLLIN},
> {fd=11, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN},
> {fd=9, events=POLLIN}, {fd=10, events=POLLIN}], 8, 1000) = 8 ([{fd=5,
> revents=POLLNVAL}, {fd=4, revents=POLLNVAL}, {fd=6, revents=POLLNVAL},
> {fd=11, revents=POLLNVAL}, {fd=7, revents=POLLNVAL}, {fd=8,
> revents=POLLNVAL}, {fd=9, revents=POLLNVAL}, {fd=10, revents=POLLNVAL}])
> rt_sigprocmask(SIG_BLOCK, [INT USR1 USR2 TERM CHLD], NULL, 8) = 0
> rt_sigaction(SIGCHLD, {0x2b7ca19cb30a, [INT USR1 USR2 TERM CHLD],
> SA_RESTORER|SA_RESTART, 0x397840f790}, NULL, 8) = 0
> rt_sigaction(SIGTERM, {0x2b7ca19cb30a, [INT USR1 USR2 TERM CHLD],
> SA_RESTORER|SA_RESTART, 0x397840f790}, NULL, 8) = 0
> rt_sigaction(SIGINT, {0x2b7ca19cb30a, [INT USR1 USR2 TERM CHLD],
> SA_RESTORER|SA_RESTART, 0x397840f790}, NULL, 8) = 0
> rt_sigaction(SIGUSR1, {0x2b7ca19cb30a, [INT USR1 USR2 TERM CHLD],
> SA_RESTORER|SA_RESTART, 0x397840f790}, NULL, 8) = 0
> rt_sigaction(SIGUSR2, {0x2b7ca19cb30a, [INT USR1 USR2 TERM CHLD],
> SA_RESTORER|SA_RESTART, 0x397840f790}, NULL, 8) = 0
> sched_yield()   = 0
> rt_sigprocmask(SIG_BLOCK, [INT USR1 USR2 TERM CHLD], NULL, 8) = 0
> rt_sigaction(SIGCHLD, {0x2b7ca19cb30a, [INT USR1 USR2 TERM CHLD],
> SA_RESTORER|SA_RESTART, 0x397840f790}, NULL, 8) = 0
> rt_sigaction(SIGTERM, {0x2b7ca19cb30a, [INT USR1 USR2 TERM CHLD],
> SA_RESTORER|SA_RESTART, 0x397840f790}, N

Re: [OMPI users] BLCR & openmpi

2016-03-23 Thread Meij, Henk
Thanks for responding.



#1 I am checkpointing the "wrapper" script (for the scheduler) which sets up 
the mpirun env, builds machinefile etc, then launches mpirun which launches 
orted which launches lmp_mpi ... this gave me an idea to check BLCR, it states

" The '--tree' flag to 'cr_checkpoint' requests a checkpoint of the process 
with the&3 given pid, and all its descendants (excluding those who's parent has 
exited and thus become children of the 'init' process). " This is the default 
blcr > 0.6.0. I explicitly added this to make sure. So everything should be 
checkpointed on down.



#2 & 3 will have to brood over that. maybe I can checkpoint my individual 
lmp_mpi processes directly



Serial invocations and restarts work just fine. I'll go to the BLCR list, maybe 
they have an idea. As you can tell below, a manual invocation yields the same 
result as via scheduler, with no messages from --kmsg-warning, like everything 
is normal.   I'll report back if I get this to work.



-Henk



[hmeij@cottontail ~]$ ssh petaltail /share/apps/blcr/0.8.5/test/bin/cr_restart 
--kmsg-warning --no-restore-pid --no-restore-pgid --no-restore-sid --relocate 
/sanscratch/612=/sanscratch/619 /sanscratch/checkpoints/612/chk.21839 &


[hmeij@cottontail sharptail]$ ssh petaltail ps -u hmeij
  PID TTY  TIME CMD
24123 ?00:00:00 sshd
24124 ?00:00:00 cr_restart
24156 ?00:00:00 lava.openmpi.wr
24157 ?00:00:28 mpirun
24176 ?00:00:00 sshd
24177 ?00:00:00 ps


From: users [users-boun...@open-mpi.org] on behalf of George Bosilca 
[bosi...@icl.utk.edu]
Sent: Wednesday, March 23, 2016 12:27 PM
To: Open MPI Users
Subject: Re: [OMPI users] BLCR & openmpi

Both BLCR and Open MPI work just fine. Independently.

Checkpointing and restarting a parallel application is not as simple as mixing 
2 tools together (especially when we talk about a communication library, aka. 
MPI), they have to cooperate in order to achieve the desired goal of being able 
to continue the execution on another set of resources. Open MPI had support for 
C/R but this feature has been lost.

1. It is not clear from your email what exactly you checkpoint. Are you 
checkpointing the mpirun process, or are you checkpointing all the MPI 
processes ?

2. What are you recovering? Assuming that you checkpoint your MPI processes 
(and not the mpirun), what you can try to do during the recovery is to spawn a 
new set of MPI processes (that will give you new orteds) and then let each one 
of these processes call the corresponding BLCR cr_restart.

3. This will not give you a working MPI environment, as the processes will know 
each other from the original execution, and will be unable to connect to each 
other to resume communications. You will have to dig a little more in the code 
in order to achieve what you want/need.

  George.


On Wed, Mar 23, 2016 at 12:14 PM, Meij, Henk 
mailto:hm...@wesleyan.edu>> wrote:

So I've redone this with openmpi 1.10.2 and another piece of software (lammps 
16feb16) and get same results.



Upon cr_restart I see the openlava_wrapper process, the mpirun process 
reappearing but no orted and no lmp_mpi processes. Not obvious error anywhere. 
Using the --save-all feature from BLCR and ignore pids.



Does BLCR and openmpi work? Anybody have any idea as to where to look?



-Henk




From: Meij, Henk
Sent: Monday, March 21, 2016 12:24 PM
To: us...@open-mpi.org
Subject: RE: BLCR & openmpi


hmm, I'm not correct. cr_restart starts with no errors, launches some of the 
processes, then suspends itself. strace on mpirun on this manual invocation 
yields the behavior same as below.



-Henk



[hmeij@swallowtail kflaherty]$ ps -u hmeij
  PID TTY  TIME CMD
29481 ?00:00:00 res
29485 ?00:00:00 1458575067.384
29488 ?00:00:00 1458575067.384.
29508 ?00:00:00 cr_restart
29509 ?00:00:00 blcr_watcher
29512 ?00:00:02 lava.openmpi.wr
29514 ?00:38:35 mpirun
30313 ?00:00:01 sshd
30314 pts/100:00:00 bash
30458 ?00:00:00 sleep
30483 ?00:00:00 sleep
30650 pts/100:00:00 cr_restart
30652 pts/100:00:00 lava.openmpi.wr
30653 pts/100:00:00 mpirun
30729 pts/100:00:00 ps
[hmeij@swallowtail kflaherty]$ jobs
[1]+  Stopped cr_restart --no-restore-pid --no-restore-pgid 
--no-restore-sid --relocate /sanscratch/383=/sanscratch/000 
/sanscratch/checkpoints/383/chk.28244


From: Meij, Henk
Sent: Monday, March 21, 2016 12:04 PM
To: us...@open-mpi.org
Subject: BLCR & openmpi


openmpi1.2 (yes, I know old),python 2.6.1 blcr 0.8.5



when I attempt to cr_restart (having performed cr_checkpoint --save-all) I can 
restart the job manually with blcr on a node. but when I go through my openlava 
scheduler, the cr_restart launches mpirun, then nothing. no ort

Re: [OMPI users] terrible infiniband performance for

2016-03-23 Thread Ronald Cohen
So I want to thank you so much! My benchmark for my actual application
went from 5052 seconds to 266 seconds with this simple fix!

Ron

---
Ron Cohen
recoh...@gmail.com
skypename: ronaldcohen
twitter: @recohen3


On Wed, Mar 23, 2016 at 11:00 AM, Ronald Cohen  wrote:
> Dear Gilles,
>
> --with-tm fails. I have now built with
> ./configure --prefix=/home/rcohen --with-tm=/opt/torque
> make clean
> make -j 8
> make install
>
> This rebuilt greatly improved performance, from 1 GF to 32 GF for 2
> nodes for a 2000 size matrix.  For 5000 it went up to 108. So this
> sounds pretty good.
>
> Thank you so much! Is there a way to test and improve for latency?
>
> Thanks!
>
> Ron
>
> ---
> Ron Cohen
> recoh...@gmail.com
> skypename: ronaldcohen
> twitter: @recohen3
>
>
> On Wed, Mar 23, 2016 at 10:38 AM, Gilles Gouaillardet
>  wrote:
>> Ronald,
>>
>> first, can you make sure tm was built ?
>> the easiest way us to
>> configure --with-tm ...
>> it will crash if tm is not found
>> if pbs/torque is not installed in a standard location, then you have to
>> configure --with-tm=
>>
>> then you can omit -hostfile from your mpirun command line
>>
>> hpl is known to scale, assuming the data is big enough, you use an optimized
>> blas, and the right number of openmp threads
>> (e.g. if you run 8 tasks per node, the you can have up to 2 openmp threads,
>> but if you use 8 or 16 threads, then performance will be worst)
>> first run xhpl one node, and when you get 80% of the peak performance, then
>> you can run on two nodes.
>>
>> Cheers,
>>
>> Gilles
>>
>> On Wednesday, March 23, 2016, Ronald Cohen  wrote:
>>>
>>> The configure line was simply:
>>>
>>>  ./configure --prefix=/home/rcohen
>>>
>>> when I run:
>>>
>>> mpirun --mca btl self,vader,openib ...
>>>
>>> I get the same lousy results: 1.5 GFLOPS
>>>
>>> The output of the grep is:
>>>
>>> Cpus_allowed_list:  0-7
>>> Cpus_allowed_list:  8-15
>>> Cpus_allowed_list:  0-7
>>> Cpus_allowed_list:  8-15
>>> Cpus_allowed_list:  0-7
>>> Cpus_allowed_list:  8-15
>>> Cpus_allowed_list:  0-7
>>> Cpus_allowed_list:  8-15
>>> Cpus_allowed_list:  0-7
>>> Cpus_allowed_list:  8-15
>>> Cpus_allowed_list:  0-7
>>> Cpus_allowed_list:  8-15
>>> Cpus_allowed_list:  0-7
>>> Cpus_allowed_list:  8-15
>>> Cpus_allowed_list:  0-7
>>> Cpus_allowed_list:  8-15
>>>
>>>
>>> linpack *HPL) certainly is known to scale fine.
>>>
>>> I am running a standard benchmark--HPL--linpack.
>>>
>>> I think it is not the compiler, but I could try that.
>>>
>>> Ron
>>>
>>>
>>>
>>>
>>> ---
>>> Ron Cohen
>>> recoh...@gmail.com
>>> skypename: ronaldcohen
>>> twitter: @recohen3
>>>
>>>
>>> On Wed, Mar 23, 2016 at 9:32 AM, Gilles Gouaillardet
>>>  wrote:
>>> > Ronald,
>>> >
>>> > the fix I mentioned landed into the v1.10 branch
>>> >
>>> > https://github.com/open-mpi/ompi-release/commit/c376994b81030cfa380c29d5b8f60c3e53d3df62
>>> >
>>> > can you please post your configure command line ?
>>> >
>>> > you can also try to
>>> > mpirun --mca btl self,vader,openib ...
>>> > to make sure your run will abort instead of falling back to tcp
>>> >
>>> > then you can
>>> > mpirun ... grep Cpus_allowed_list /proc/self/status
>>> > to confirm your tasks do not end up bound to the same cores when running
>>> > on
>>> > two nodes.
>>> >
>>> > is your application known to scale on infiniband network ?
>>> > or did you naively hope it would scale ?
>>> >
>>> > at first, I recommend you run standard benchmark to make sure you get
>>> > the
>>> > performance you expect from your infiniband network
>>> > (for example IMB or OSU benchmark)
>>> > and run this test in the same environment than your app (e.g. via a
>>> > batch
>>> > manager if applicable)
>>> >
>>> > if you do not get the performance you expect, then I suggest you try the
>>> > stock gcc compiler shipped with your distro and see if it helps.
>>> >
>>> > Cheers,
>>> >
>>> > Gilles
>>> >
>>> > On Wednesday, March 23, 2016, Ronald Cohen  wrote:
>>> >>
>>> >> Thank  you! Here are the answers:
>>> >>
>>> >> I did not try a previous release of gcc.
>>> >> I built from a tarball.
>>> >> What should I do about the iirc issue--how should I check?
>>> >> Are there any flags I should be using for infiniband? Is this a
>>> >> problem with latency?
>>> >>
>>> >> Ron
>>> >>
>>> >>
>>> >> ---
>>> >> Ron Cohen
>>> >> recoh...@gmail.com
>>> >> skypename: ronaldcohen
>>> >> twitter: @recohen3
>>> >>
>>> >>
>>> >> On Wed, Mar 23, 2016 at 8:13 AM, Gilles Gouaillardet
>>> >>  wrote:
>>> >> > Ronald,
>>> >> >
>>> >> > did you try to build openmpi with a previous gcc release ?
>>> >> > if yes, what about the performance ?
>>> >> >
>>> >> > did you build openmpi from a tarball or from git ?
>>> >> > if from git and without VPATH, then you need to
>>> >> > configure with --disable-debug
>>> >> >
>>> >> > iirc, one issue was identified previously
>>> >> > (gcc optimization that prevents the memory wrapper from behaving as
>>> >> > expected) an

Re: [OMPI users] Why do I need a C++ linker while linking in MPI C code with CUDA?

2016-03-23 Thread Sylvain Jeaugey

Hi Durga,

Sorry for the late reply and thanks for reporting that issue. As Rayson 
mentioned, CUDA is intrinsically C++ and indeed uses the host C++ 
compiler. Hence linking MPI + CUDA code may need to use mpic++.


It happens to work with mpicc on various platforms where the libstdc++ 
is linked anyway but in your case it wasn't. We fixed the Makefile on 
github to use mpic++ as linker.


Sylvain

On 03/20/2016 07:37 PM, dpchoudh . wrote:
I'd tend to agree with Gilles. I have written CUDA programs in pure C 
(i.e. neither involving MPI nor C++) and a pure C based tool chain 
builds the code successfully. So I don't see why CUDA should be 
intrinsically C++.


From the Makefile (that I had attached in my previous mail) the only 
CUDA library being linked against is this:


/usr/local/cuda/lib64/libcudart.so
and ldd on that shows this:

[durga@smallMPI lib64]$ ldd libcudart.so
linux-vdso.so.1 =>  (0x7ffe1e7f1000)
libc.so.6 => /lib64/libc.so.6 (0x7ff7e4493000)
libdl.so.2 => /lib64/libdl.so.2 (0x7ff7e428f000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x7ff7e4072000)
librt.so.1 => /lib64/librt.so.1 (0x7ff7e3e6a000)
/lib64/ld-linux-x86-64.so.2 (0x7ff7e4af3000)

I don't see any C++ dependency here either.

And finally, I don't think there is any version issue. This is a clean 
CUDA 7.5 install directly from NVIDIA CUDA repo (for Redhat) and all 
provided examples run fine with this installation.


I believe there are NVIDIA employees in this list; hopefully one of 
them will clarify.


Thanks
Durga

Life is complex. It has real and imaginary parts.

On Sun, Mar 20, 2016 at 10:23 PM, Gilles Gouaillardet 
mailto:gilles.gouaillar...@gmail.com>> 
wrote:


I am a bit puzzled...

if only cuda uses the c++ std libraries, then it should depend on them
(ldd libcudaxyz.so can be used to confirm that)
and then linking with cuda lib should pull the c++ libs

could there be a version issue ?
e.g. the missing symbol is not provided by the version of the c++
lib that is pulled.
that might occur if you are using cuda built for distro X on distro Y

could you please double check this ?
if everything should work, then i recommend you report this to nvidia

Cheers,

Gilles

On Monday, March 21, 2016, Damien Hocking mailto:dam...@0x544745.com>> wrote:

Durga,

The Cuda libraries use the C++ std libraries.  That's the
std::ios_base errors.. You need the C++ linker to bring those in.

Damien

On March 20, 2016 9:15:47 AM "dpchoudh ." 
wrote:


Hello all

I downloaded some code samples from here:

https://github.com/parallel-forall/code-samples/

and tried to build the subdirectory

posts/cuda-aware-mpi-example/src

in my CentOS 7 machine.

I had to make several changes to the Makefile before it would
build. The modified Makefile is attached (the make targets I
am talking about are the 3rd and 4th from the bottom). Most
of the modifications can be explained as possible platform
specific variations (such as path differences betwen Ubuntu
and CentOS), except the following:

I had to use a C++ linker (mpic++) to link in the object
files that were produced with C host compiler (mpicc) and
CUDA compiler (nvcc). If I did not do this, (i.e. I stuck to
mpicc for linking), I got the following link error:

mpicc -L/usr/local/cuda/lib64 -lcudart -lm -o
../bin/jacobi_cuda_normal_mpi jacobi.o input.o host.o
device.o  cuda_normal_mpi.o
device.o: In function
`__static_initialization_and_destruction_0(int, int)':
tmpxft_4651_-4_Device.cudafe1.cpp:(.text+0xd1e):
undefined reference to `std::ios_base::Init::Init()'
tmpxft_4651_-4_Device.cudafe1.cpp:(.text+0xd2d):
undefined reference to `std::ios_base::Init::~Init()'
collect2: error: ld returned 1 exit status

Can someone please explain why would I need a C++ linker for
object files that were generated using C compiler? Note that
if I use mpic++ both for compiling and linking, there are no
errors either.

Thanks in advance
Durga

Life is complex. It has real and imaginary parts.
___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2016/03/28760.php



___
users mailing list
us...@open-mpi.org 
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2016/03/28762.php




__

[OMPI users] Problems in compiling a code with dynamic linking

2016-03-23 Thread Elio Physics
Dear all,


I have been trying ,for the last week, compiling a code (SPRKKR). the 
compilation went through ok. however, there are problems with the executable 
(kkrscf6.3MPI) not finding the MKL library links. i could not fix the 
problem..I have tried several things but in vain..I will post both the "make" 
file and the "PBS" script file. Please can anyone help me in this? the error I 
am getting is:


 /opt/intel/composer_xe_2013_sp1/bin/compilervars.sh: No such file or directory
/home/emoujaes/Elie/SPRKKR/bin/kkrscf6.3MPI: error while loading shared 
libraries: libmkl_intel_lp64.so: cannot open shared object file: No such file 
or directory
/home/emoujaes/Elie/SPRKKR/bin/kkrscf6.3MPI: error while loading shared 
libraries: libmkl_intel_lp64.so: cannot open shared object file: No such file 
or directory
/home/emoujaes/Elie/SPRKKR/bin/kkrscf6.3MPI: error while loading shared 
libraries: libmkl_intel_lp64.so: cannot open shared object file: No such file 
or directory



make file :


###
# Here the common makefile starts which does depend on the OS  
###
#
#  FC:  compiler name and common options e.g. f77 -c
#  LINK:linker name and common options e.g. g77 -shared
#  FFLAGS:  optimization e.g. -O3
# OP0: force nooptimisation for some routiens e.g. -O0
#  VERSION: additional string for executable e.g. 6.3.0
#  LIB: library names   e.g. -L/usr/lib -latlas -lblas -llapack
#   (lapack and blas libraries are needed)
#  BUILD_TYPE:  string "debug" switches on debugging options
#   (NOTE: you may call, e.g. "make scf BUILD_TYPE=debug"
#to produce executable with debugging flags from command line)
#  BIN: directory for executables
#  INCLUDE: directory for include files
#   (NOTE: directory with mpi include files has to be properly set
#even for sequential executable)
###

BUILD_TYPE ?=
#BUILD_TYPE := debug

VERSION = 6.3

ifeq ($(BUILD_TYPE), debug)
 VERSION := $(VERSION)$(BUILD_TYPE)
endif

BIN =~/Elie/SPRKKR/bin
#BIN=~/bin
#BIN=/tmp/$(USER)



LIB = -L/opt/intel/composer_xe_2013_sp1/mkl/lib/intel64 -lmkl_blas95_lp64 
-lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_core  -lmkl_sequential


# Include mpif.h
INCLUDE =-I/usr/include/openmpi-x86_64


#FFLAGS
FFLAGS = -O2


FC   = mpif90 -c $(FFLAGS) $(INCLUDE)
LINK = mpif90   $(FFLAGS) $(INCLUDE)

MPI=MPI



PBS script:


BIN =~/Elie/SPRKKR/bin
#BIN=~/bin
#BIN=/tmp/$(USER)



LIB = -L/opt/intel/composer_xe_2013_sp1/mkl/lib/intel64 -lmkl_blas95_lp64 
-lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_core  -lmkl_sequential


# Include mpif.h
INCLUDE =-I/usr/include/openmpi-x86_64


#FFLAGS
FFLAGS = -O2


FC   = mpif90 -c $(FFLAGS) $(INCLUDE)
LINK = mpif90   $(FFLAGS) $(INCLUDE)

MPI=MPI

[emoujaes@jlborges SPRKKR]$ cd Fe
[emoujaes@jlborges Fe]$ ls
Fe.inp  Fe.pbs  Fescf.e50505  Fescf.o50505  
scf-50505.jlborges.fisica.ufmg.br.out
[emoujaes@jlborges Fe]$ more Fe.pbs
#PBS -S /bin/bash
#PBS -l nodes=1:ppn=8
#PBS -l walltime=70:00:00
#PBS -N Fescf


# procura o nome o input baseado no nome do job (linha #PBS -N xxx acima).
INP=Fe.inp

OUT=scf-$PBS_JOBID.out

## Configura o no de calculo

source /opt/intel/composer_xe_2013_sp1/bin/compilervars.sh

module load libraries/openmpi-1.5.4/gnu-4.4
#ormacoes do job no arquivo de saida
qstat -an -u $USER
cat $PBS_NODEFILE



#---  Inicio do trabalho - #



## executa o programa
cd $PBS_O_WORKDIR

export OMP_NUM_THREADS=1

mpirun ~/Elie/SPRKKR/bin/kkrscf6.3MPI $INP > $OUT




Re: [OMPI users] Problems in compiling a code with dynamic linking

2016-03-23 Thread Gilles Gouaillardet

Elio,

it seems /opt/intel/composer_xe_2013_sp1/bin/compilervars.sh is only 
available on your login/frontend nodes,

but not on your compute nodes.
you might be luckier with
/opt/intel/mkl/bin/mklvars.sh

an other option is to
ldd /home/emoujaes/Elie/SPRKKR/bin/kkrscf6.3MPI
on your login node, and explicitly set the LD_LIBRARY_PATH in your PBS 
script


if /opt/intel/composer_xe_2013_sp1/mkl/lib/intel64 is available on your 
compute nodes, you might want to append

-Wl,-rpath,/opt/intel/composer_xe_2013_sp1/mkl/lib/intel64
to LIB
/* if you do that, keep in mind you might not automatically use the most 
up to date mkl lib when they get upgraded by your sysadmin */


Cheers,

Gilles

On 3/24/2016 11:03 AM, Elio Physics wrote:


Dear all,


I have been trying ,for the last week, compiling a code (SPRKKR). the 
compilation went through ok. however, there are problems with the 
executable (kkrscf6.3MPI) not finding the MKL library links. i could 
not fix the problem..I have tried several things but in vain..I will 
post both the "make" file and the "PBS" script file. Please can anyone 
help me in this? the error I am getting is:



 /opt/intel/composer_xe_2013_sp1/bin/compilervars.sh: No such file or 
directory
/home/emoujaes/Elie/SPRKKR/bin/kkrscf6.3MPI: error while loading 
shared libraries: libmkl_intel_lp64.so: cannot open shared object 
file: No such file or directory
/home/emoujaes/Elie/SPRKKR/bin/kkrscf6.3MPI: error while loading 
shared libraries: libmkl_intel_lp64.so: cannot open shared object 
file: No such file or directory
/home/emoujaes/Elie/SPRKKR/bin/kkrscf6.3MPI: error while loading 
shared libraries: libmkl_intel_lp64.so: cannot open shared object 
file: No such file or directory



_make file :_

_
_

###
# Here the common makefile starts which does depend on the 
OS  

###
#
#  FC:  compiler name and common options e.g. f77 -c
#  LINK:linker name and common options e.g. g77 -shared
#  FFLAGS:  optimization e.g. -O3
# OP0: force nooptimisation for some routiens e.g. -O0
#  VERSION: additional string for executable e.g. 6.3.0
#  LIB: library names   e.g. -L/usr/lib -latlas -lblas -llapack
#   (lapack and blas libraries are needed)
#  BUILD_TYPE:  string "debug" switches on debugging options
#   (NOTE: you may call, e.g. "make scf BUILD_TYPE=debug"
#to produce executable with debugging flags from 
command line)

#  BIN: directory for executables
#  INCLUDE: directory for include files
#   (NOTE: directory with mpi include files has to be 
properly set

#even for sequential executable)
###

BUILD_TYPE ?=
#BUILD_TYPE := debug

VERSION = 6.3

ifeq ($(BUILD_TYPE), debug)
 VERSION := $(VERSION)$(BUILD_TYPE)
endif

BIN =~/Elie/SPRKKR/bin
#BIN=~/bin
#BIN=/tmp/$(USER)



LIB = -L/opt/intel/composer_xe_2013_sp1/mkl/lib/intel64 
-lmkl_blas95_lp64 -lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_core  
-lmkl_sequential



# Include mpif.h
INCLUDE =-I/usr/include/openmpi-x86_64


#FFLAGS
FFLAGS = -O2


FC   = mpif90 -c $(FFLAGS) $(INCLUDE)
LINK = mpif90   $(FFLAGS) $(INCLUDE)

MPI=MPI



_PBS script:_

_
_

BIN =~/Elie/SPRKKR/bin
#BIN=~/bin
#BIN=/tmp/$(USER)



LIB = -L/opt/intel/composer_xe_2013_sp1/mkl/lib/intel64 
-lmkl_blas95_lp64 -lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_core  
-lmkl_sequential



# Include mpif.h
INCLUDE =-I/usr/include/openmpi-x86_64


#FFLAGS
FFLAGS = -O2


FC   = mpif90 -c $(FFLAGS) $(INCLUDE)
LINK = mpif90   $(FFLAGS) $(INCLUDE)

MPI=MPI

[emoujaes@jlborges SPRKKR]$ cd Fe
[emoujaes@jlborges Fe]$ ls
Fe.inp  Fe.pbs  Fescf.e50505  Fescf.o50505 
scf-50505.jlborges.fisica.ufmg.br.out

[emoujaes@jlborges Fe]$ more Fe.pbs
#PBS -S /bin/bash
#PBS -l nodes=1:ppn=8
#PBS -l walltime=70:00:00
#PBS -N Fescf


# procura o nome o input baseado no nome do job (linha #PBS -N xxx acima).
INP=Fe.inp

OUT=scf-$PBS_JOBID.out

## Configura o no de calculo

source /opt/intel/composer_xe_2013_sp1/bin/compilervars.sh

module load libraries/openmpi-1.5.4/gnu-4.4
#ormacoes do job no arquivo de saida
qstat -an -u $USER
cat $PBS_NODEFILE



#---  Inicio do trabalho - #



## executa o programa
cd $PBS_O_WORKDIR

export OMP_NUM_THREADS=1

mpirun ~/Elie/SPRKKR/bin/kkrscf6.3MPI $INP > $OUT




___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/03/28812.php




Re: [OMPI users] Problems in compiling a code with dynamic linking

2016-03-23 Thread Elio Physics
Dear Gilles,


thanks for your reply and your options. I have tried the first option, hich for 
me basically is the easiest. I have compiled using "make.inc" but now setting  
LIB = -L/opt/intel/mkl/lib/intel64 -lmkl_blas95_lp64 -lmkl_lapack95_lp64 
-lmkl_intel_lp64 -lmkl_core  -lmkl_sequential


Every went well. Then I tried the PBS script wjere I have added  these two 
lines:


source /opt/intel/mkl/bin/mklvars.sh
export LD_LIBRARY_PATH=/opt/intel/mkl/bin/mklvars.sh


But i still get the same error:


/opt/intel/mkl/bin/mklvars.sh: No such file or directory
/home/emoujaes/Elie/SPRKKR/bin/kkrscf6.3MPI: error while loading shared 
libraries: libmkl_intel_lp64.so: cannot open shared object file: No such file 
or directory



I just cannot understand why is it giving the same error and why it could not 
find the file : /opt/intel/mkl/bin/mklvars.sh although the link is true!


Any advice please?


Thanks





From: users  on behalf of Gilles Gouaillardet 

Sent: Thursday, March 24, 2016 12:22 AM
To: Open MPI Users
Subject: Re: [OMPI users] Problems in compiling a code with dynamic linking

Elio,

it seems /opt/intel/composer_xe_2013_sp1/bin/compilervars.sh is only available 
on your login/frontend nodes,
but not on your compute nodes.
you might be luckier with
/opt/intel/mkl/bin/mklvars.sh

an other option is to
ldd /home/emoujaes/Elie/SPRKKR/bin/kkrscf6.3MPI
on your login node, and explicitly set the LD_LIBRARY_PATH in your PBS script

if /opt/intel/composer_xe_2013_sp1/mkl/lib/intel64 is available on your compute 
nodes, you might want to append
-Wl,-rpath,/opt/intel/composer_xe_2013_sp1/mkl/lib/intel64
to LIB
/* if you do that, keep in mind you might not automatically use the most up to 
date mkl lib when they get upgraded by your sysadmin */

Cheers,

Gilles

On 3/24/2016 11:03 AM, Elio Physics wrote:

Dear all,


I have been trying ,for the last week, compiling a code (SPRKKR). the 
compilation went through ok. however, there are problems with the executable 
(kkrscf6.3MPI) not finding the MKL library links. i could not fix the 
problem..I have tried several things but in vain..I will post both the "make" 
file and the "PBS" script file. Please can anyone help me in this? the error I 
am getting is:


 /opt/intel/composer_xe_2013_sp1/bin/compilervars.sh: No such file or directory
/home/emoujaes/Elie/SPRKKR/bin/kkrscf6.3MPI: error while loading shared 
libraries: libmkl_intel_lp64.so: cannot open shared object file: No such file 
or directory
/home/emoujaes/Elie/SPRKKR/bin/kkrscf6.3MPI: error while loading shared 
libraries: libmkl_intel_lp64.so: cannot open shared object file: No such file 
or directory
/home/emoujaes/Elie/SPRKKR/bin/kkrscf6.3MPI: error while loading shared 
libraries: libmkl_intel_lp64.so: cannot open shared object file: No such file 
or directory



make file :


###
# Here the common makefile starts which does depend on the OS  
###
#
#  FC:  compiler name and common options e.g. f77 -c
#  LINK:linker name and common options e.g. g77 -shared
#  FFLAGS:  optimization e.g. -O3
# OP0: force nooptimisation for some routiens e.g. -O0
#  VERSION: additional string for executable e.g. 6.3.0
#  LIB: library names   e.g. -L/usr/lib -latlas -lblas -llapack
#   (lapack and blas libraries are needed)
#  BUILD_TYPE:  string "debug" switches on debugging options
#   (NOTE: you may call, e.g. "make scf BUILD_TYPE=debug"
#to produce executable with debugging flags from command line)
#  BIN: directory for executables
#  INCLUDE: directory for include files
#   (NOTE: directory with mpi include files has to be properly set
#even for sequential executable)
###

BUILD_TYPE ?=
#BUILD_TYPE := debug

VERSION = 6.3

ifeq ($(BUILD_TYPE), debug)
 VERSION := $(VERSION)$(BUILD_TYPE)
endif

BIN =~/Elie/SPRKKR/bin
#BIN=~/bin
#BIN=/tmp/$(USER)



LIB = -L/opt/intel/composer_xe_2013_sp1/mkl/lib/intel64 -lmkl_blas95_lp64 
-lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_core  -lmkl_sequential


# Include mpif.h
INCLUDE =-I/usr/include/openmpi-x86_64


#FFLAGS
FFLAGS = -O2


FC   = mpif90 -c $(FFLAGS) $(INCLUDE)
LINK = mpif90   $(FFLAGS) $(INCLUDE)

MPI=MPI



PBS script:


BIN =~/Elie/SPRKKR/bin
#BIN=~/bin
#BIN=/tmp/$(USER)



LIB = -L/opt/intel/composer_xe_2013_sp1/mkl/lib/intel64 -lmkl_blas95_lp64 
-lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_core  -lmkl_sequential


# Include mpif.h
INCLUDE =-I/usr/include/openmpi-x86_64


#FFLAGS
FFLAGS = -O2


FC   = mpif90 -c $(FFLAGS) $(INCLUDE)
LINK = mpif90   $(FFLAGS) $(INCLUDE)

MPI=MPI

[emoujaes@jlborges SPRKKR]$ cd Fe
[emoujaes@jlborges Fe]$ ls
Fe.inp  Fe.pbs  Fescf.e50505  Fescf.o50505