Should -bind-to-core also help? Does the error I get matter? Should we install libnumactl and libnumactl-devel packages. ? Thanks!
Ron --- Ron Cohen recoh...@gmail.com skypename: ronaldcohen twitter: @recohen3 On Fri, Mar 25, 2016 at 3:43 PM, Ralph Castain <r...@open-mpi.org> wrote: > Yeah, it can really have an impact! It is unfortunately highly > application-specific, so all we can do is provide the tools. > > As you can see from the binding map, we are tight packing the procs on each > node to maximize the use of shared memory. However, this assumes that each > rank is predominantly going to “talk” to rank+/-1 - i.e., the pattern > involves nearest neighboring ranks. If that isn’t true (e.g., the lowest > ranked process on one node talks to the the lowest ranked process on the > next node, etc.), then this would be a bad mapping for performance. > > In that case, you can use the “rank-by” option to maintain the location and > binding, but change the assigned MCW ranks to align with your communication > pattern. > > HTH > Ralph > > > > On Mar 25, 2016, at 12:28 PM, Ronald Cohen <recoh...@gmail.com> wrote: > > So I have been experimenting with different mappings, and performance > varies a lot. The best I find is: > -map-by slot:pe=2 -np 32 > with 2 threads > which gives > [n001.cluster.com:29647] MCW rank 0 bound to socket 0[core 0[hwt 0]], > socket 0[core 1[hwt 0]]: [B/B/./././././.][./././././././.] > [n001.cluster.com:29647] MCW rank 1 bound to socket 0[core 2[hwt 0]], > socket 0[core 3[hwt 0]]: [././B/B/./././.][./././././././.] > [n001.cluster.com:29647] MCW rank 2 bound to socket 0[core 4[hwt 0]], > socket 0[core 5[hwt 0]]: [././././B/B/./.][./././././././.] > [n001.cluster.com:29647] MCW rank 3 bound to socket 0[core 6[hwt 0]], > socket 0[core 7[hwt 0]]: [././././././B/B][./././././././.] > [n001.cluster.com:29647] MCW rank 4 bound to socket 1[core 8[hwt 0]], > socket 1[core 9[hwt 0]]: [./././././././.][B/B/./././././.] > [n001.cluster.com:29647] MCW rank 5 bound to socket 1[core 10[hwt 0]], > socket 1[core 11[hwt 0]]: [./././././././.][././B/B/./././.] > [n001.cluster.com:29647] MCW rank 6 bound to socket 1[core 12[hwt 0]], > socket 1[core 13[hwt 0]]: [./././././././.][././././B/B/./.] > [n001.cluster.com:29647] MCW rank 7 bound to socket 1[core 14[hwt 0]], > socket 1[core 15[hwt 0]]: [./././././././.][././././././B/B] > [n003.cluster.com:29842] MCW rank 16 bound to socket 0[core 0[hwt 0]], > socket 0[core 1[hwt 0]]: [B/B/./././././.][./././././././.] > [n002.cluster.com:32210] MCW ra > ... > > --- > Ron Cohen > recoh...@gmail.com > skypename: ronaldcohen > twitter: @recohen3 > > > On Fri, Mar 25, 2016 at 3:13 PM, Ronald Cohen <recoh...@gmail.com> wrote: > > So > -map-by node:pe=2 -np 32 > runs and gives great performance, though a little worse than -n 32 > it puts the correct number of processes, but does do round robin. Is > there a way to do this without the round robin? Also note the error > message: > > > ====================== ALLOCATED NODES ====================== > n001: slots=16 max_slots=0 slots_inuse=0 state=UP > n004.cluster.com: slots=16 max_slots=0 slots_inuse=0 state=UP > n003.cluster.com: slots=16 max_slots=0 slots_inuse=0 state=UP > n002.cluster.com: slots=16 max_slots=0 slots_inuse=0 state=UP > ================================================================= > -------------------------------------------------------------------------- > WARNING: a request was made to bind a process. While the system > supports binding the process itself, at least one node does NOT > support binding memory to the process location. > > Node: n001 > > This usually is due to not having the required NUMA support installed > on the node. In some Linux distributions, the required support is > contained in the libnumactl and libnumactl-devel packages. > This is a warning only; your job will continue, though performance may > be degraded. > -------------------------------------------------------------------------- > [n001.cluster.com:29316] MCW rank 0 bound to socket 0[core 0[hwt 0]], > socket 0[core 1[hwt 0]]: [B/B/./././././.][./././././././.] > [n001.cluster.com:29316] MCW rank 4 bound to socket 0[core 2[hwt 0]], > socket 0[core 3[hwt 0]]: [././B/B/./././.][./././././././.] > [n001.cluster.com:29316] MCW rank 8 bound to socket 0[core 4[hwt 0]], > socket 0[core 5[hwt 0]]: [././././B/B/./.][./././././././.] > [n001.cluster.com:29316] MCW rank 12 bound to socket 0[core 6[hwt 0]], > socket 0[core 7[hwt 0]]: [././././././B/B][./././././././.] > [n001.cluster.com:29316] MCW rank 16 bound to socket 1[core 8[hwt 0]], > socket 1[core 9[hwt 0]]: [./././././././.][B/B/./././././.] > [n001.cluster.com:29316] MCW rank 20 bound to socket 1[core 10[hwt > 0]], socket 1[core 11[hwt 0]]: [./././././././.][././B/B/./././.] > [n001.cluster.com:29316] MCW rank 24 bound to socket 1[core 12[hwt > 0]], socket 1[core 13[hwt 0]]: [./././././././.][././././B/B/./.] > [n001.cluster.com:29316] MCW rank 28 bound to socket 1[core 14[hwt > 0]], socket 1[core 15[hwt 0]]: [./././././././.][././././././B/B] > [n003.cluster.com:29704] MCW rank 22 bound to socket 1[core 10[hwt > 0]], socket 1[core 11[hwt 0]]: [./././././././.][././B/B/./././.] > --- > Ron Cohen > recoh...@gmail.com > skypename: ronaldcohen > twitter: @recohen3 > > > On Fri, Mar 25, 2016 at 2:32 PM, Ronald Cohen <recoh...@gmail.com> wrote: > > So it seems my > -map-by core:pe=2 -n 32 > should have worked . I would have 32 procs with 2 on each, giving 64 total. > But it doesn't > --- > Ron Cohen > recoh...@gmail.com > skypename: ronaldcohen > twitter: @recohen3 > > > On Fri, Mar 25, 2016 at 2:19 PM, Ralph Castain <r...@open-mpi.org> wrote: > > pe=N tells us to map N cores (we call them “processing elements” because > they could be HTs if you —use-hwthreads-as-cpus) to each process. So we will > bind each process to N cores. > > So if you want 16 procs, each with two processing elements assigned to them > (which is a good choice if you are using 2 threads/process), then you would > use: > > mpirun -map-by core:pe=2 -np 16 > > If you add -report-bindings, you’ll see each process bound to two cores, > with the procs tightly packed on each node until that node’s cores are fully > utilized. We do handle the unlikely event that you asked for a non-integer > multiple of cores - i.e., if you have 32 cores on a node, and you ask for > pe=6, we will wind up leaving two cores idle. > > HTH > Ralph > > On Mar 25, 2016, at 11:11 AM, Ronald Cohen <recoh...@gmail.com> wrote: > > or is it mpirun -map-by core:pe=8 -n 16 ? > > --- > Ron Cohen > recoh...@gmail.com > skypename: ronaldcohen > twitter: @recohen3 > > > On Fri, Mar 25, 2016 at 2:10 PM, Ronald Cohen <recoh...@gmail.com> wrote: > > Thank you--I looked on the man page and it is not clear to me what > pe=2 does. Is that the number of threads? So if I want 16 mpi procs > with 2 threads is it for 32 cores (two nodes) > > mpirun -map-by core:pe=2 -n 16 > > ? > > Sorry if I mangled this. > > > Ron > > --- > Ron Cohen > recoh...@gmail.com > skypename: ronaldcohen > twitter: @recohen3 > > > On Fri, Mar 25, 2016 at 2:03 PM, Ralph Castain <r...@open-mpi.org> wrote: > > Okay, what I would suggest is that you use the following cmd line: > > mpirun -map-by core:pe=2 (or 8 or whatever number you want) > > This should give you the best performance as it will tight-pack the procs > and assign them to the correct number of cores. See if that helps > > On Mar 25, 2016, at 10:38 AM, Ronald Cohen <recoh...@gmail.com> wrote: > > 1.10.2 > > Ron > > --- > Ron Cohen > recoh...@gmail.com > skypename: ronaldcohen > twitter: @recohen3 > > > On Fri, Mar 25, 2016 at 1:30 PM, Ralph Castain <r...@open-mpi.org> wrote: > > Hmmm…what version of OMPI are you using? > > > On Mar 25, 2016, at 10:27 AM, Ronald Cohen <recoh...@gmail.com> wrote: > > --report-bindings didn't report anything > --- > Ron Cohen > recoh...@gmail.com > skypename: ronaldcohen > twitter: @recohen3 > > > On Fri, Mar 25, 2016 at 1:24 PM, Ronald Cohen <recoh...@gmail.com> wrote: > > —display-allocation an > didn't seem to give useful information: > > ====================== ALLOCATED NODES ====================== > n005: slots=16 max_slots=0 slots_inuse=0 state=UP > n008.cluster.com: slots=16 max_slots=0 slots_inuse=0 state=UP > n007.cluster.com: slots=16 max_slots=0 slots_inuse=0 state=UP > n006.cluster.com: slots=16 max_slots=0 slots_inuse=0 state=UP > ================================================================= > > for > mpirun -display-allocation --map-by ppr:8:node -n 32 > > Ron > > --- > Ron Cohen > recoh...@gmail.com > skypename: ronaldcohen > twitter: @recohen3 > > > On Fri, Mar 25, 2016 at 1:17 PM, Ronald Cohen <recoh...@gmail.com> wrote: > > Actually there was the same number of procs per node in each case. I > verified this by logging into the nodes while they were running--in > both cases 4 per node . > > Ron > > --- > Ron Cohen > recoh...@gmail.com > skypename: ronaldcohen > twitter: @recohen3 > > > On Fri, Mar 25, 2016 at 1:14 PM, Ralph Castain <r...@open-mpi.org> wrote: > > > On Mar 25, 2016, at 9:59 AM, Ronald Cohen <recoh...@gmail.com> wrote: > > It is very strange but my program runs slower with any of these > choices than if IO simply use: > > mpirun -n 16 > with > #PBS -l > nodes=n013.cluster.com:ppn=4+n014.cluster.com:ppn=4+n015.cluster.com:ppn=4+n016.cluster.com:ppn=4 > for example. > > > This command will tightly pack as many procs as possible on a node - note > that we may well not see the PBS directives regarding number of ppn. Add > —display-allocation and let’s see how many slots we think were assigned on > each node > > > The timing for the latter is 165 seconds, and for > #PBS -l nodes=4:ppn=16,pmem=1gb > mpirun --map-by ppr:4:node -n 16 > it is 368 seconds. > > > It will typically be faster if you pack more procs/node as they can use > shared memory for communication. > > > Ron > > --- > Ron Cohen > recoh...@gmail.com > skypename: ronaldcohen > twitter: @recohen3 > > > On Fri, Mar 25, 2016 at 12:43 PM, Ralph Castain <r...@open-mpi.org> wrote: > > > On Mar 25, 2016, at 9:40 AM, Ronald Cohen <recoh...@gmail.com> wrote: > > Thank you! I will try it! > > > What would > -cpus-per-proc 4 -n 16 > do? > > > This would bind each process to 4 cores, filling each node with procs until > the cores on that node were exhausted, to a total of 16 processes within the > allocation. > > > Ron > --- > Ron Cohen > recoh...@gmail.com > skypename: ronaldcohen > twitter: @recohen3 > > > On Fri, Mar 25, 2016 at 12:38 PM, Ralph Castain <r...@open-mpi.org> wrote: > > Add -rank-by node to your cmd line. You’ll still get 4 procs/node, but they > will be ranked by node instead of consecutively within a node. > > > > On Mar 25, 2016, at 9:30 AM, Ronald Cohen <recoh...@gmail.com> wrote: > > I am using > > mpirun --map-by ppr:4:node -n 16 > > and this loads the processes in round robin fashion. This seems to be > twice as slow for my code as loading them node by node, 4 processes > per node. > > How can I not load them round robin, but node by node? > > Thanks! > > Ron > > > --- > Ron Cohen > recoh...@gmail.com > skypename: ronaldcohen > twitter: @recohen3 > > --- > Ronald Cohen > Geophysical Laboratory > Carnegie Institution > 5251 Broad Branch Rd., N.W. > Washington, D.C. 20015 > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/03/28828.php > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/03/28829.php > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/03/28830.php > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/03/28831.php > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/03/28832.php > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/03/28833.php > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/03/28837.php > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/03/28840.php > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/03/28843.php > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/03/28844.php > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/03/28846.php > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/03/28847.php > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/03/28851.php > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/03/28852.php