Re: [OMPI users] loading processes per node

Ralph Castain Fri, 25 Mar 2016 14:19:31 -0400 (EDT)

pe=N tells us to map N cores (we call them “processing elements” because they 
could be HTs if you —use-hwthreads-as-cpus) to each process. So we will bind 
each process to N cores.


So if you want 16 procs, each with two processing elements assigned to them 
(which is a good choice if you are using 2 threads/process), then you would use:

mpirun -map-by core:pe=2 -np 16

If you add -report-bindings, you’ll see each process bound to two cores, with 
the procs tightly packed on each node until that node’s cores are fully 
utilized. We do handle the unlikely event that you asked for a non-integer 
multiple of cores - i.e., if you have 32 cores on a node, and you ask for pe=6, 
we will wind up leaving two cores idle.

HTH
Ralph

> On Mar 25, 2016, at 11:11 AM, Ronald Cohen <recoh...@gmail.com> wrote:
> 
> or is it mpirun -map-by core:pe=8 -n 16 ?
> 
> ---
> Ron Cohen
> recoh...@gmail.com <mailto:recoh...@gmail.com>
> skypename: ronaldcohen
> twitter: @recohen3
> 
> 
> On Fri, Mar 25, 2016 at 2:10 PM, Ronald Cohen <recoh...@gmail.com 
> <mailto:recoh...@gmail.com>> wrote:
>> Thank you--I looked on the man page and it is not clear to me what
>> pe=2 does. Is that the number of threads? So if I want 16 mpi procs
>> with 2 threads is it for 32 cores (two nodes)
>> 
>> mpirun -map-by core:pe=2 -n 16
>> 
>> ?
>> 
>> Sorry if I mangled this.
>> 
>> 
>> Ron
>> 
>> ---
>> Ron Cohen
>> recoh...@gmail.com
>> skypename: ronaldcohen
>> twitter: @recohen3
>> 
>> 
>> On Fri, Mar 25, 2016 at 2:03 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>> Okay, what I would suggest is that you use the following cmd line:
>>> 
>>> mpirun -map-by core:pe=2 (or 8 or whatever number you want)
>>> 
>>> This should give you the best performance as it will tight-pack the procs 
>>> and assign them to the correct number of cores. See if that helps
>>> 
>>>> On Mar 25, 2016, at 10:38 AM, Ronald Cohen <recoh...@gmail.com> wrote:
>>>> 
>>>> 1.10.2
>>>> 
>>>> Ron
>>>> 
>>>> ---
>>>> Ron Cohen
>>>> recoh...@gmail.com
>>>> skypename: ronaldcohen
>>>> twitter: @recohen3
>>>> 
>>>> 
>>>> On Fri, Mar 25, 2016 at 1:30 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>>>> Hmmm…what version of OMPI are you using?
>>>>> 
>>>>> 
>>>>> On Mar 25, 2016, at 10:27 AM, Ronald Cohen <recoh...@gmail.com> wrote:
>>>>> 
>>>>> --report-bindings didn't report anything
>>>>> ---
>>>>> Ron Cohen
>>>>> recoh...@gmail.com
>>>>> skypename: ronaldcohen
>>>>> twitter: @recohen3
>>>>> 
>>>>> 
>>>>> On Fri, Mar 25, 2016 at 1:24 PM, Ronald Cohen <recoh...@gmail.com> wrote:
>>>>> 
>>>>> —display-allocation an
>>>>> didn't seem to give useful information:
>>>>> 
>>>>> ======================   ALLOCATED NODES   ======================
>>>>>      n005: slots=16 max_slots=0 slots_inuse=0 state=UP
>>>>>      n008.cluster.com: slots=16 max_slots=0 slots_inuse=0 state=UP
>>>>>      n007.cluster.com: slots=16 max_slots=0 slots_inuse=0 state=UP
>>>>>      n006.cluster.com: slots=16 max_slots=0 slots_inuse=0 state=UP
>>>>> =================================================================
>>>>> 
>>>>> for
>>>>> mpirun -display-allocation  --map-by ppr:8:node -n 32
>>>>> 
>>>>> Ron
>>>>> 
>>>>> ---
>>>>> Ron Cohen
>>>>> recoh...@gmail.com
>>>>> skypename: ronaldcohen
>>>>> twitter: @recohen3
>>>>> 
>>>>> 
>>>>> On Fri, Mar 25, 2016 at 1:17 PM, Ronald Cohen <recoh...@gmail.com> wrote:
>>>>> 
>>>>> Actually there was the same number of procs per node in each case. I
>>>>> verified this by logging into the nodes while they were running--in
>>>>> both cases 4 per node .
>>>>> 
>>>>> Ron
>>>>> 
>>>>> ---
>>>>> Ron Cohen
>>>>> recoh...@gmail.com
>>>>> skypename: ronaldcohen
>>>>> twitter: @recohen3
>>>>> 
>>>>> 
>>>>> On Fri, Mar 25, 2016 at 1:14 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>>>> 
>>>>> 
>>>>> On Mar 25, 2016, at 9:59 AM, Ronald Cohen <recoh...@gmail.com> wrote:
>>>>> 
>>>>> It is very strange but my program runs slower with any of these
>>>>> choices than if IO simply use:
>>>>> 
>>>>> mpirun  -n 16
>>>>> with
>>>>> #PBS -l
>>>>> nodes=n013.cluster.com:ppn=4+n014.cluster.com:ppn=4+n015.cluster.com:ppn=4+n016.cluster.com:ppn=4
>>>>> for example.
>>>>> 
>>>>> 
>>>>> This command will tightly pack as many procs as possible on a node - note
>>>>> that we may well not see the PBS directives regarding number of ppn. Add
>>>>> —display-allocation and let’s see how many slots we think were assigned on
>>>>> each node
>>>>> 
>>>>> 
>>>>> The timing for the latter is 165 seconds, and for
>>>>> #PBS -l nodes=4:ppn=16,pmem=1gb
>>>>> mpirun  --map-by ppr:4:node -n 16
>>>>> it is 368 seconds.
>>>>> 
>>>>> 
>>>>> It will typically be faster if you pack more procs/node as they can use
>>>>> shared memory for communication.
>>>>> 
>>>>> 
>>>>> Ron
>>>>> 
>>>>> ---
>>>>> Ron Cohen
>>>>> recoh...@gmail.com
>>>>> skypename: ronaldcohen
>>>>> twitter: @recohen3
>>>>> 
>>>>> 
>>>>> On Fri, Mar 25, 2016 at 12:43 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>>>> 
>>>>> 
>>>>> On Mar 25, 2016, at 9:40 AM, Ronald Cohen <recoh...@gmail.com> wrote:
>>>>> 
>>>>> Thank you! I will try it!
>>>>> 
>>>>> 
>>>>> What would
>>>>> -cpus-per-proc  4 -n 16
>>>>> do?
>>>>> 
>>>>> 
>>>>> This would bind each process to 4 cores, filling each node with procs 
>>>>> until
>>>>> the cores on that node were exhausted, to a total of 16 processes within 
>>>>> the
>>>>> allocation.
>>>>> 
>>>>> 
>>>>> Ron
>>>>> ---
>>>>> Ron Cohen
>>>>> recoh...@gmail.com
>>>>> skypename: ronaldcohen
>>>>> twitter: @recohen3
>>>>> 
>>>>> 
>>>>> On Fri, Mar 25, 2016 at 12:38 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>>>> 
>>>>> Add -rank-by node to your cmd line. You’ll still get 4 procs/node, but 
>>>>> they
>>>>> will be ranked by node instead of consecutively within a node.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Mar 25, 2016, at 9:30 AM, Ronald Cohen <recoh...@gmail.com> wrote:
>>>>> 
>>>>> I am using
>>>>> 
>>>>> mpirun  --map-by ppr:4:node -n 16
>>>>> 
>>>>> and this loads the processes in round robin fashion. This seems to be
>>>>> twice as slow for my code as loading them node by node, 4 processes
>>>>> per node.
>>>>> 
>>>>> How can I not load them round robin, but node by node?
>>>>> 
>>>>> Thanks!
>>>>> 
>>>>> Ron
>>>>> 
>>>>> 
>>>>> ---
>>>>> Ron Cohen
>>>>> recoh...@gmail.com
>>>>> skypename: ronaldcohen
>>>>> twitter: @recohen3
>>>>> 
>>>>> ---
>>>>> Ronald Cohen
>>>>> Geophysical Laboratory
>>>>> Carnegie Institution
>>>>> 5251 Broad Branch Rd., N.W.
>>>>> Washington, D.C. 20015
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/users/2016/03/28828.php
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/users/2016/03/28829.php
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/users/2016/03/28830.php
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/users/2016/03/28831.php
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/users/2016/03/28832.php
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/users/2016/03/28833.php
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/users/2016/03/28837.php
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/users/2016/03/28840.php
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2016/03/28843.php
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2016/03/28844.php
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/03/28846.php 
> <http://www.open-mpi.org/community/lists/users/2016/03/28846.php>

Re: [OMPI users] loading processes per node

Reply via email to