The -n 32 run ion contrast gave: [n011.cluster.com:05847] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]]: [B/B/B/B/B/B/B/B][./././././././.] [n011.cluster.com:05847] MCW rank 1 bound to socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], socket 1[core 11[hwt 0]], socket 1[core 12[hwt 0]], socket 1[core 13[hwt 0]], socket 1[core 14[hwt 0]], socket 1[core 15[hwt 0]]: [./././././././.][B/B/B/B/B/B/B/B] [n011.cluster.com:05847] MCW rank 2 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]]: [B/B/B/B/B/B/B/B][./././././././.] [n011.cluster.com:05847] MCW rank 3 bound to socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], socket 1[core 11[hwt 0]], socket 1[core 12[hwt 0]], socket 1[core 13[hwt 0]], socket 1[core 14[hwt 0]], socket 1[core 15[hwt 0]]: [./././././././.][B/B/B/B/B/B/B/B] [n011.cluster.com:05847] MCW rank 4 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]]: [B/B/B/B/B/B/B/B][./././././././.] [n011.cluster.com:05847] MCW rank 5 bound to socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], socket 1[core 11[hwt 0]], socket 1[core 12[hwt 0]], socket 1[core 13[hwt 0]], socket 1[core 14[hwt 0]], socket 1[core 15[hwt 0]]: [./././././././.][B/B/B/B/B/B/B/B] [n011.cluster.com:05847] MCW rank 6 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]]: [B/B/B/B/B/B/B/B][./././././././.] [n011.cluster.com:05847] MCW rank 7 bound to socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], socket 1[core 11[hwt 0]], socket 1[core 12[hwt 0]], socket 1[core 13[hwt 0]], socket 1[core 14[hwt 0]], socket 1[core 15[hwt 0]]: [./././././././.][B/B/B/B/B/B/B/B] [n019.cluster.com:02562] MCW rank 24 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]]: [B/B/B/B/B/B/B/B][./././././././.] --- Ron Cohen recoh...@gmail.com skypename: ronaldcohen twitter: @recohen3
On Fri, Mar 25, 2016 at 1:29 PM, Ronald Cohen <recoh...@gmail.com> wrote: > sorry they are in stderr. > > Whaty shoudl I learn from: > > n001.cluster.com:27958] MCW rank 0 bound to socket 0[core 0[hwt 0]], > socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt > 0 > ]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]], socket 0[core > 6[hwt 0]], socket 0[core 7[hwt 0]]: [B/B/B/B/B/B/B/B][./././././././.] > [n001.cluster.com:27958] MCW rank 1 bound to socket 1[core 8[hwt 0]], > socket 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], socket 1[core > 11[hwt > 0]], socket 1[core 12[hwt 0]], socket 1[core 13[hwt 0]], socket > 1[core 14[hwt 0]], socket 1[core 15[hwt 0]]: > [./././././././.][B/B/B/B/B/B/B > /B] > [n001.cluster.com:27958] MCW rank 2 bound to socket 0[core 0[hwt 0]], > socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt > 0 > ]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]], socket 0[core > 6[hwt 0]], socket 0[core 7[hwt 0]]: [B/B/B/B/B/B/B/B][./././././././.] > [n001.cluster.com:27958] MCW rank 3 bound to socket 1[core 8[hwt 0]], > socket 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], socket 1[core > 11[hwt > 0]], socket 1[core 12[hwt 0]], socket 1[core 13[hwt 0]], socket > 1[core 14[hwt 0]], socket 1[core 15[hwt 0]]: > [./././././././.][B/B/B/B/B/B/B > /B] > [n001.cluster.com:27958] MCW rank 4 bound to socket 0[core 0[hwt 0]], > socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt > 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]], socket 0[core > 6[hwt 0]], socket 0[core 7[hwt 0]]: [B/B/B/B/B/B/B/B][./././././././.] > [n001.cluster.com:27958] MCW rank 5 bound to socket 1[core 8[hwt 0]], > socket 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], socket 1[core > 11[hwt 0]], socket 1[core 12[hwt 0]], socket 1[core 13[hwt 0]], socket > 1[core 14[hwt 0]], socket 1[core 15[hwt 0]]: > [./././././././.][B/B/B/B/B/B/B/B] > [n001.cluster.com:27958] MCW rank 6 bound to socket 0[core 0[hwt 0]], > socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt > 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]], socket 0[core > 6[hwt 0]], socket 0[core 7[hwt 0]]: [B/B/B/B/B/B/B/B][./././././././.] > [n001.cluster.com:27958] MCW rank 7 bound to socket 1[core 8[hwt 0]], > socket 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], socket 1[core > 11[hwt 0]], socket 1[core 12[hwt 0]], socket 1[core 13[hwt 0]], socket > 1[core 14[hwt 0]], socket 1[core 15[hwt 0]]: > [./././././././.][B/B/B/B/B/B/B/B] > [n002 > > etc? > > --- > Ron Cohen > recoh...@gmail.com > skypename: ronaldcohen > twitter: @recohen3 > > > On Fri, Mar 25, 2016 at 1:27 PM, Ronald Cohen <recoh...@gmail.com> wrote: >> --report-bindings didn't report anything >> --- >> Ron Cohen >> recoh...@gmail.com >> skypename: ronaldcohen >> twitter: @recohen3 >> >> >> On Fri, Mar 25, 2016 at 1:24 PM, Ronald Cohen <recoh...@gmail.com> wrote: >>> —display-allocation an >>> didn't seem to give useful information: >>> >>> ====================== ALLOCATED NODES ====================== >>> n005: slots=16 max_slots=0 slots_inuse=0 state=UP >>> n008.cluster.com: slots=16 max_slots=0 slots_inuse=0 state=UP >>> n007.cluster.com: slots=16 max_slots=0 slots_inuse=0 state=UP >>> n006.cluster.com: slots=16 max_slots=0 slots_inuse=0 state=UP >>> ================================================================= >>> >>> for >>> mpirun -display-allocation --map-by ppr:8:node -n 32 >>> >>> Ron >>> >>> --- >>> Ron Cohen >>> recoh...@gmail.com >>> skypename: ronaldcohen >>> twitter: @recohen3 >>> >>> >>> On Fri, Mar 25, 2016 at 1:17 PM, Ronald Cohen <recoh...@gmail.com> wrote: >>>> Actually there was the same number of procs per node in each case. I >>>> verified this by logging into the nodes while they were running--in >>>> both cases 4 per node . >>>> >>>> Ron >>>> >>>> --- >>>> Ron Cohen >>>> recoh...@gmail.com >>>> skypename: ronaldcohen >>>> twitter: @recohen3 >>>> >>>> >>>> On Fri, Mar 25, 2016 at 1:14 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>>> >>>>>> On Mar 25, 2016, at 9:59 AM, Ronald Cohen <recoh...@gmail.com> wrote: >>>>>> >>>>>> It is very strange but my program runs slower with any of these >>>>>> choices than if IO simply use: >>>>>> >>>>>> mpirun -n 16 >>>>>> with >>>>>> #PBS -l >>>>>> nodes=n013.cluster.com:ppn=4+n014.cluster.com:ppn=4+n015.cluster.com:ppn=4+n016.cluster.com:ppn=4 >>>>>> for example. >>>>> >>>>> This command will tightly pack as many procs as possible on a node - note >>>>> that we may well not see the PBS directives regarding number of ppn. Add >>>>> —display-allocation and let’s see how many slots we think were assigned >>>>> on each node >>>>> >>>>>> >>>>>> The timing for the latter is 165 seconds, and for >>>>>> #PBS -l nodes=4:ppn=16,pmem=1gb >>>>>> mpirun --map-by ppr:4:node -n 16 >>>>>> it is 368 seconds. >>>>> >>>>> It will typically be faster if you pack more procs/node as they can use >>>>> shared memory for communication. >>>>> >>>>>> >>>>>> Ron >>>>>> >>>>>> --- >>>>>> Ron Cohen >>>>>> recoh...@gmail.com >>>>>> skypename: ronaldcohen >>>>>> twitter: @recohen3 >>>>>> >>>>>> >>>>>> On Fri, Mar 25, 2016 at 12:43 PM, Ralph Castain <r...@open-mpi.org> >>>>>> wrote: >>>>>>> >>>>>>>> On Mar 25, 2016, at 9:40 AM, Ronald Cohen <recoh...@gmail.com> wrote: >>>>>>>> >>>>>>>> Thank you! I will try it! >>>>>>>> >>>>>>>> >>>>>>>> What would >>>>>>>> -cpus-per-proc 4 -n 16 >>>>>>>> do? >>>>>>> >>>>>>> This would bind each process to 4 cores, filling each node with procs >>>>>>> until the cores on that node were exhausted, to a total of 16 processes >>>>>>> within the allocation. >>>>>>> >>>>>>>> >>>>>>>> Ron >>>>>>>> --- >>>>>>>> Ron Cohen >>>>>>>> recoh...@gmail.com >>>>>>>> skypename: ronaldcohen >>>>>>>> twitter: @recohen3 >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Mar 25, 2016 at 12:38 PM, Ralph Castain <r...@open-mpi.org> >>>>>>>> wrote: >>>>>>>>> Add -rank-by node to your cmd line. You’ll still get 4 procs/node, >>>>>>>>> but they will be ranked by node instead of consecutively within a >>>>>>>>> node. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Mar 25, 2016, at 9:30 AM, Ronald Cohen <recoh...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>> I am using >>>>>>>>>> >>>>>>>>>> mpirun --map-by ppr:4:node -n 16 >>>>>>>>>> >>>>>>>>>> and this loads the processes in round robin fashion. This seems to be >>>>>>>>>> twice as slow for my code as loading them node by node, 4 processes >>>>>>>>>> per node. >>>>>>>>>> >>>>>>>>>> How can I not load them round robin, but node by node? >>>>>>>>>> >>>>>>>>>> Thanks! >>>>>>>>>> >>>>>>>>>> Ron >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> Ron Cohen >>>>>>>>>> recoh...@gmail.com >>>>>>>>>> skypename: ronaldcohen >>>>>>>>>> twitter: @recohen3 >>>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> Ronald Cohen >>>>>>>>>> Geophysical Laboratory >>>>>>>>>> Carnegie Institution >>>>>>>>>> 5251 Broad Branch Rd., N.W. >>>>>>>>>> Washington, D.C. 20015 >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> us...@open-mpi.org >>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>> Link to this post: >>>>>>>>>> http://www.open-mpi.org/community/lists/users/2016/03/28828.php >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> Link to this post: >>>>>>>>> http://www.open-mpi.org/community/lists/users/2016/03/28829.php >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> Link to this post: >>>>>>>> http://www.open-mpi.org/community/lists/users/2016/03/28830.php >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/users/2016/03/28831.php >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/users/2016/03/28832.php >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2016/03/28833.php