Re: [OMPI users] loading processes per node

2016-03-25 Thread Ralph Castain
> On Mar 25, 2016, at 12:53 PM, Ronald Cohen wrote: > > Should > -bind-to-core > also help? No - if you specify pe=N, then you will automatically bind to core > Does the error I get matter? Should we install libnumactl > and libnumactl-devel packages. ? Thanks! Yes! The warning you are gettin

Re: [OMPI users] loading processes per node

2016-03-25 Thread Ronald Cohen
Should -bind-to-core also help? Does the error I get matter? Should we install libnumactl and libnumactl-devel packages. ? Thanks! Ron --- Ron Cohen recoh...@gmail.com skypename: ronaldcohen twitter: @recohen3 On Fri, Mar 25, 2016 at 3:43 PM, Ralph Castain wrote: > Yeah, it can really have an

Re: [OMPI users] loading processes per node

2016-03-25 Thread Ralph Castain
Yeah, it can really have an impact! It is unfortunately highly application-specific, so all we can do is provide the tools. As you can see from the binding map, we are tight packing the procs on each node to maximize the use of shared memory. However, this assumes that each rank is predominantl

Re: [OMPI users] loading processes per node

2016-03-25 Thread Ronald Cohen
So I have been experimenting with different mappings, and performance varies a lot. The best I find is: -map-by slot:pe=2 -np 32 with 2 threads which gives [n001.cluster.com:29647] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]]: [B/B/./././././.][./././././././.] [n001.cluste

Re: [OMPI users] loading processes per node

2016-03-25 Thread Ronald Cohen
So -map-by node:pe=2 -np 32 runs and gives great performance, though a little worse than -n 32 it puts the correct number of processes, but does do round robin. Is there a way to do this without the round robin? Also note the error message: == ALLOCATED NODES ===

Re: [OMPI users] loading processes per node

2016-03-25 Thread Ronald Cohen
So it seems my -map-by core:pe=2 -n 32 should have worked . I would have 32 procs with 2 on each, giving 64 total. But it doesn't --- Ron Cohen recoh...@gmail.com skypename: ronaldcohen twitter: @recohen3 On Fri, Mar 25, 2016 at 2:19 PM, Ralph Castain wrote: > pe=N tells us to map N cores (we ca

Re: [OMPI users] loading processes per node

2016-03-25 Thread Ronald Cohen
No this doesn't work. When I try: -map-by core:pe=2 -n 32 on 4 nodes #PBS -l nodes=4:ppn=16,pmem=2gb giving a total of 64 cores I get A request for multiple cpus-per-proc was given, but a directive was also give to map to an object level that cannot support that directive. Please specify a mappin

Re: [OMPI users] loading processes per node

2016-03-25 Thread Ralph Castain
pe=N tells us to map N cores (we call them “processing elements” because they could be HTs if you —use-hwthreads-as-cpus) to each process. So we will bind each process to N cores. So if you want 16 procs, each with two processing elements assigned to them (which is a good choice if you are usin

Re: [OMPI users] loading processes per node

2016-03-25 Thread Ronald Cohen
or is it mpirun -map-by core:pe=8 -n 16 ? --- Ron Cohen recoh...@gmail.com skypename: ronaldcohen twitter: @recohen3 On Fri, Mar 25, 2016 at 2:10 PM, Ronald Cohen wrote: > Thank you--I looked on the man page and it is not clear to me what > pe=2 does. Is that the number of threads? So if I want

Re: [OMPI users] loading processes per node

2016-03-25 Thread Ronald Cohen
Thank you--I looked on the man page and it is not clear to me what pe=2 does. Is that the number of threads? So if I want 16 mpi procs with 2 threads is it for 32 cores (two nodes) mpirun -map-by core:pe=2 -n 16 ? Sorry if I mangled this. Ron --- Ron Cohen recoh...@gmail.com skypename: ronald

Re: [OMPI users] loading processes per node

2016-03-25 Thread Ralph Castain
Okay, what I would suggest is that you use the following cmd line: mpirun -map-by core:pe=2 (or 8 or whatever number you want) This should give you the best performance as it will tight-pack the procs and assign them to the correct number of cores. See if that helps > On Mar 25, 2016, at 10:38

Re: [OMPI users] loading processes per node

2016-03-25 Thread Ronald Cohen
1.10.2 Ron --- Ron Cohen recoh...@gmail.com skypename: ronaldcohen twitter: @recohen3 On Fri, Mar 25, 2016 at 1:30 PM, Ralph Castain wrote: > Hmmm…what version of OMPI are you using? > > > On Mar 25, 2016, at 10:27 AM, Ronald Cohen wrote: > > --report-bindings didn't report anything > --- > R

Re: [OMPI users] loading processes per node

2016-03-25 Thread Ronald Cohen
Yes -- there are 16 cores per node--I want to use 8 mpi processes with 2 threads each, or 4 mpi processeswith 4 threads each, on each node. Ron --- Ron Cohen recoh...@gmail.com skypename: ronaldcohen twitter: @recohen3 On Fri, Mar 25, 2016 at 1:29 PM, Ralph Castain wrote: > Interesting - it th

Re: [OMPI users] loading processes per node

2016-03-25 Thread Ronald Cohen
The -n 32 run ion contrast gave: [n011.cluster.com:05847] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]]: [B/B/B/B/B/B/B/B][./

Re: [OMPI users] loading processes per node

2016-03-25 Thread Ralph Castain
Hmmm…what version of OMPI are you using? > On Mar 25, 2016, at 10:27 AM, Ronald Cohen wrote: > > --report-bindings didn't report anything > --- > Ron Cohen > recoh...@gmail.com > skypename: ronaldcohen > twitter: @recohen3 > > > On Fri, Mar 25, 2016 at 1:24 PM, Ron

Re: [OMPI users] loading processes per node

2016-03-25 Thread Ronald Cohen
sorry they are in stderr. Whaty shoudl I learn from: n001.cluster.com:27958] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0 ]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]]:

Re: [OMPI users] loading processes per node

2016-03-25 Thread Ralph Castain
Interesting - it thinks we have 16 slots/node, so if you just use “mpirun -np 16”, you should wind up with all the procs on one node > On Mar 25, 2016, at 10:24 AM, Ronald Cohen wrote: > > —display-allocation an > didn't seem to give useful information: > > == ALLOCATED

Re: [OMPI users] loading processes per node

2016-03-25 Thread Ronald Cohen
--report-bindings didn't report anything --- Ron Cohen recoh...@gmail.com skypename: ronaldcohen twitter: @recohen3 On Fri, Mar 25, 2016 at 1:24 PM, Ronald Cohen wrote: > —display-allocation an > didn't seem to give useful information: > > == ALLOCATED NODES

Re: [OMPI users] loading processes per node

2016-03-25 Thread Ronald Cohen
—display-allocation an didn't seem to give useful information: == ALLOCATED NODES == n005: slots=16 max_slots=0 slots_inuse=0 state=UP n008.cluster.com: slots=16 max_slots=0 slots_inuse=0 state=UP n007.cluster.com: slots=16 max_sl

Re: [OMPI users] loading processes per node

2016-03-25 Thread Ralph Castain
Hmmm…add —report-bindings to your cmd line. The only thing I can think would be that the binding is different > On Mar 25, 2016, at 10:17 AM, Ronald Cohen wrote: > > Actually there was the same number of procs per node in each case. I > verified this by logging into the nodes while they were ru

Re: [OMPI users] loading processes per node

2016-03-25 Thread Ronald Cohen
Actually there was the same number of procs per node in each case. I verified this by logging into the nodes while they were running--in both cases 4 per node . Ron --- Ron Cohen recoh...@gmail.com skypename: ronaldcohen twitter: @recohen3 On Fri, Mar 25, 2016 at 1:14 PM, Ralph Castain wrote:

Re: [OMPI users] loading processes per node

2016-03-25 Thread Ralph Castain
> On Mar 25, 2016, at 9:59 AM, Ronald Cohen wrote: > > It is very strange but my program runs slower with any of these > choices than if IO simply use: > > mpirun -n 16 > with > #PBS -l > nodes=n013.cluster.com:ppn=4+n014.cluster.com:ppn=4+n015.cluster.com:ppn=4+n016.cluster.com:ppn=4 > for e

Re: [OMPI users] loading processes per node

2016-03-25 Thread Ronald Cohen
It is very strange but my program runs slower with any of these choices than if IO simply use: mpirun -n 16 with #PBS -l nodes=n013.cluster.com:ppn=4+n014.cluster.com:ppn=4+n015.cluster.com:ppn=4+n016.cluster.com:ppn=4 for example. The timing for the latter is 165 seconds, and for #PBS -l nodes

Re: [OMPI users] loading processes per node

2016-03-25 Thread Ralph Castain
> On Mar 25, 2016, at 9:40 AM, Ronald Cohen wrote: > > Thank you! I will try it! > > > What would > -cpus-per-proc 4 -n 16 > do? This would bind each process to 4 cores, filling each node with procs until the cores on that node were exhausted, to a total of 16 processes within the allocati

Re: [OMPI users] loading processes per node

2016-03-25 Thread Ronald Cohen
Thank you! I will try it! What would -cpus-per-proc 4 -n 16 do? Ron --- Ron Cohen recoh...@gmail.com skypename: ronaldcohen twitter: @recohen3 On Fri, Mar 25, 2016 at 12:38 PM, Ralph Castain wrote: > Add -rank-by node to your cmd line. You’ll still get 4 procs/node, but they > will be ranke

Re: [OMPI users] loading processes per node

2016-03-25 Thread Ralph Castain
Add -rank-by node to your cmd line. You’ll still get 4 procs/node, but they will be ranked by node instead of consecutively within a node. > On Mar 25, 2016, at 9:30 AM, Ronald Cohen wrote: > > I am using > > mpirun --map-by ppr:4:node -n 16 > > and this loads the processes in round robin

[OMPI users] loading processes per node

2016-03-25 Thread Ronald Cohen
I am using mpirun --map-by ppr:4:node -n 16 and this loads the processes in round robin fashion. This seems to be twice as slow for my code as loading them node by node, 4 processes per node. How can I not load them round robin, but node by node? Thanks! Ron --- Ron Cohen recoh...@gmail.com

Re: [OMPI users] Problems in compiling a code with dynamic linking

2016-03-25 Thread Elio Physics
First of all, I really want to thank Gilles, Thomas and Gus for their time in helping me as much as possible with my problem. Problem was resolved. Actually, i have forgotten to include the libraries in the PBS script.. The cluster we have provides modules of libraries such as : compilers/intel