So I have been experimenting with different mappings, and performance varies a lot. The best I find is: -map-by slot:pe=2 -np 32 with 2 threads which gives [n001.cluster.com:29647] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]]: [B/B/./././././.][./././././././.] [n001.cluster.com:29647] MCW rank 1 bound to socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]]: [././B/B/./././.][./././././././.] [n001.cluster.com:29647] MCW rank 2 bound to socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [././././B/B/./.][./././././././.] [n001.cluster.com:29647] MCW rank 3 bound to socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]]: [././././././B/B][./././././././.] [n001.cluster.com:29647] MCW rank 4 bound to socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]]: [./././././././.][B/B/./././././.] [n001.cluster.com:29647] MCW rank 5 bound to socket 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: [./././././././.][././B/B/./././.] [n001.cluster.com:29647] MCW rank 6 bound to socket 1[core 12[hwt 0]], socket 1[core 13[hwt 0]]: [./././././././.][././././B/B/./.] [n001.cluster.com:29647] MCW rank 7 bound to socket 1[core 14[hwt 0]], socket 1[core 15[hwt 0]]: [./././././././.][././././././B/B] [n003.cluster.com:29842] MCW rank 16 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]]: [B/B/./././././.][./././././././.] [n002.cluster.com:32210] MCW ra ...
--- Ron Cohen recoh...@gmail.com skypename: ronaldcohen twitter: @recohen3 On Fri, Mar 25, 2016 at 3:13 PM, Ronald Cohen <recoh...@gmail.com> wrote: > So > -map-by node:pe=2 -np 32 > runs and gives great performance, though a little worse than -n 32 > it puts the correct number of processes, but does do round robin. Is > there a way to do this without the round robin? Also note the error > message: > > > ====================== ALLOCATED NODES ====================== > n001: slots=16 max_slots=0 slots_inuse=0 state=UP > n004.cluster.com: slots=16 max_slots=0 slots_inuse=0 state=UP > n003.cluster.com: slots=16 max_slots=0 slots_inuse=0 state=UP > n002.cluster.com: slots=16 max_slots=0 slots_inuse=0 state=UP > ================================================================= > -------------------------------------------------------------------------- > WARNING: a request was made to bind a process. While the system > supports binding the process itself, at least one node does NOT > support binding memory to the process location. > > Node: n001 > > This usually is due to not having the required NUMA support installed > on the node. In some Linux distributions, the required support is > contained in the libnumactl and libnumactl-devel packages. > This is a warning only; your job will continue, though performance may > be degraded. > -------------------------------------------------------------------------- > [n001.cluster.com:29316] MCW rank 0 bound to socket 0[core 0[hwt 0]], > socket 0[core 1[hwt 0]]: [B/B/./././././.][./././././././.] > [n001.cluster.com:29316] MCW rank 4 bound to socket 0[core 2[hwt 0]], > socket 0[core 3[hwt 0]]: [././B/B/./././.][./././././././.] > [n001.cluster.com:29316] MCW rank 8 bound to socket 0[core 4[hwt 0]], > socket 0[core 5[hwt 0]]: [././././B/B/./.][./././././././.] > [n001.cluster.com:29316] MCW rank 12 bound to socket 0[core 6[hwt 0]], > socket 0[core 7[hwt 0]]: [././././././B/B][./././././././.] > [n001.cluster.com:29316] MCW rank 16 bound to socket 1[core 8[hwt 0]], > socket 1[core 9[hwt 0]]: [./././././././.][B/B/./././././.] > [n001.cluster.com:29316] MCW rank 20 bound to socket 1[core 10[hwt > 0]], socket 1[core 11[hwt 0]]: [./././././././.][././B/B/./././.] > [n001.cluster.com:29316] MCW rank 24 bound to socket 1[core 12[hwt > 0]], socket 1[core 13[hwt 0]]: [./././././././.][././././B/B/./.] > [n001.cluster.com:29316] MCW rank 28 bound to socket 1[core 14[hwt > 0]], socket 1[core 15[hwt 0]]: [./././././././.][././././././B/B] > [n003.cluster.com:29704] MCW rank 22 bound to socket 1[core 10[hwt > 0]], socket 1[core 11[hwt 0]]: [./././././././.][././B/B/./././.] > --- > Ron Cohen > recoh...@gmail.com > skypename: ronaldcohen > twitter: @recohen3 > > > On Fri, Mar 25, 2016 at 2:32 PM, Ronald Cohen <recoh...@gmail.com> wrote: >> So it seems my >> -map-by core:pe=2 -n 32 >> should have worked . I would have 32 procs with 2 on each, giving 64 total. >> But it doesn't >> --- >> Ron Cohen >> recoh...@gmail.com >> skypename: ronaldcohen >> twitter: @recohen3 >> >> >> On Fri, Mar 25, 2016 at 2:19 PM, Ralph Castain <r...@open-mpi.org> wrote: >>> pe=N tells us to map N cores (we call them “processing elements” because >>> they could be HTs if you —use-hwthreads-as-cpus) to each process. So we will >>> bind each process to N cores. >>> >>> So if you want 16 procs, each with two processing elements assigned to them >>> (which is a good choice if you are using 2 threads/process), then you would >>> use: >>> >>> mpirun -map-by core:pe=2 -np 16 >>> >>> If you add -report-bindings, you’ll see each process bound to two cores, >>> with the procs tightly packed on each node until that node’s cores are fully >>> utilized. We do handle the unlikely event that you asked for a non-integer >>> multiple of cores - i.e., if you have 32 cores on a node, and you ask for >>> pe=6, we will wind up leaving two cores idle. >>> >>> HTH >>> Ralph >>> >>> On Mar 25, 2016, at 11:11 AM, Ronald Cohen <recoh...@gmail.com> wrote: >>> >>> or is it mpirun -map-by core:pe=8 -n 16 ? >>> >>> --- >>> Ron Cohen >>> recoh...@gmail.com >>> skypename: ronaldcohen >>> twitter: @recohen3 >>> >>> >>> On Fri, Mar 25, 2016 at 2:10 PM, Ronald Cohen <recoh...@gmail.com> wrote: >>> >>> Thank you--I looked on the man page and it is not clear to me what >>> pe=2 does. Is that the number of threads? So if I want 16 mpi procs >>> with 2 threads is it for 32 cores (two nodes) >>> >>> mpirun -map-by core:pe=2 -n 16 >>> >>> ? >>> >>> Sorry if I mangled this. >>> >>> >>> Ron >>> >>> --- >>> Ron Cohen >>> recoh...@gmail.com >>> skypename: ronaldcohen >>> twitter: @recohen3 >>> >>> >>> On Fri, Mar 25, 2016 at 2:03 PM, Ralph Castain <r...@open-mpi.org> wrote: >>> >>> Okay, what I would suggest is that you use the following cmd line: >>> >>> mpirun -map-by core:pe=2 (or 8 or whatever number you want) >>> >>> This should give you the best performance as it will tight-pack the procs >>> and assign them to the correct number of cores. See if that helps >>> >>> On Mar 25, 2016, at 10:38 AM, Ronald Cohen <recoh...@gmail.com> wrote: >>> >>> 1.10.2 >>> >>> Ron >>> >>> --- >>> Ron Cohen >>> recoh...@gmail.com >>> skypename: ronaldcohen >>> twitter: @recohen3 >>> >>> >>> On Fri, Mar 25, 2016 at 1:30 PM, Ralph Castain <r...@open-mpi.org> wrote: >>> >>> Hmmm…what version of OMPI are you using? >>> >>> >>> On Mar 25, 2016, at 10:27 AM, Ronald Cohen <recoh...@gmail.com> wrote: >>> >>> --report-bindings didn't report anything >>> --- >>> Ron Cohen >>> recoh...@gmail.com >>> skypename: ronaldcohen >>> twitter: @recohen3 >>> >>> >>> On Fri, Mar 25, 2016 at 1:24 PM, Ronald Cohen <recoh...@gmail.com> wrote: >>> >>> —display-allocation an >>> didn't seem to give useful information: >>> >>> ====================== ALLOCATED NODES ====================== >>> n005: slots=16 max_slots=0 slots_inuse=0 state=UP >>> n008.cluster.com: slots=16 max_slots=0 slots_inuse=0 state=UP >>> n007.cluster.com: slots=16 max_slots=0 slots_inuse=0 state=UP >>> n006.cluster.com: slots=16 max_slots=0 slots_inuse=0 state=UP >>> ================================================================= >>> >>> for >>> mpirun -display-allocation --map-by ppr:8:node -n 32 >>> >>> Ron >>> >>> --- >>> Ron Cohen >>> recoh...@gmail.com >>> skypename: ronaldcohen >>> twitter: @recohen3 >>> >>> >>> On Fri, Mar 25, 2016 at 1:17 PM, Ronald Cohen <recoh...@gmail.com> wrote: >>> >>> Actually there was the same number of procs per node in each case. I >>> verified this by logging into the nodes while they were running--in >>> both cases 4 per node . >>> >>> Ron >>> >>> --- >>> Ron Cohen >>> recoh...@gmail.com >>> skypename: ronaldcohen >>> twitter: @recohen3 >>> >>> >>> On Fri, Mar 25, 2016 at 1:14 PM, Ralph Castain <r...@open-mpi.org> wrote: >>> >>> >>> On Mar 25, 2016, at 9:59 AM, Ronald Cohen <recoh...@gmail.com> wrote: >>> >>> It is very strange but my program runs slower with any of these >>> choices than if IO simply use: >>> >>> mpirun -n 16 >>> with >>> #PBS -l >>> nodes=n013.cluster.com:ppn=4+n014.cluster.com:ppn=4+n015.cluster.com:ppn=4+n016.cluster.com:ppn=4 >>> for example. >>> >>> >>> This command will tightly pack as many procs as possible on a node - note >>> that we may well not see the PBS directives regarding number of ppn. Add >>> —display-allocation and let’s see how many slots we think were assigned on >>> each node >>> >>> >>> The timing for the latter is 165 seconds, and for >>> #PBS -l nodes=4:ppn=16,pmem=1gb >>> mpirun --map-by ppr:4:node -n 16 >>> it is 368 seconds. >>> >>> >>> It will typically be faster if you pack more procs/node as they can use >>> shared memory for communication. >>> >>> >>> Ron >>> >>> --- >>> Ron Cohen >>> recoh...@gmail.com >>> skypename: ronaldcohen >>> twitter: @recohen3 >>> >>> >>> On Fri, Mar 25, 2016 at 12:43 PM, Ralph Castain <r...@open-mpi.org> wrote: >>> >>> >>> On Mar 25, 2016, at 9:40 AM, Ronald Cohen <recoh...@gmail.com> wrote: >>> >>> Thank you! I will try it! >>> >>> >>> What would >>> -cpus-per-proc 4 -n 16 >>> do? >>> >>> >>> This would bind each process to 4 cores, filling each node with procs until >>> the cores on that node were exhausted, to a total of 16 processes within the >>> allocation. >>> >>> >>> Ron >>> --- >>> Ron Cohen >>> recoh...@gmail.com >>> skypename: ronaldcohen >>> twitter: @recohen3 >>> >>> >>> On Fri, Mar 25, 2016 at 12:38 PM, Ralph Castain <r...@open-mpi.org> wrote: >>> >>> Add -rank-by node to your cmd line. You’ll still get 4 procs/node, but they >>> will be ranked by node instead of consecutively within a node. >>> >>> >>> >>> On Mar 25, 2016, at 9:30 AM, Ronald Cohen <recoh...@gmail.com> wrote: >>> >>> I am using >>> >>> mpirun --map-by ppr:4:node -n 16 >>> >>> and this loads the processes in round robin fashion. This seems to be >>> twice as slow for my code as loading them node by node, 4 processes >>> per node. >>> >>> How can I not load them round robin, but node by node? >>> >>> Thanks! >>> >>> Ron >>> >>> >>> --- >>> Ron Cohen >>> recoh...@gmail.com >>> skypename: ronaldcohen >>> twitter: @recohen3 >>> >>> --- >>> Ronald Cohen >>> Geophysical Laboratory >>> Carnegie Institution >>> 5251 Broad Branch Rd., N.W. >>> Washington, D.C. 20015 >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/03/28828.php >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/03/28829.php >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/03/28830.php >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/03/28831.php >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/03/28832.php >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/03/28833.php >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/03/28837.php >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/03/28840.php >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/03/28843.php >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/03/28844.php >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/03/28846.php >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/03/28847.php