Re: [OMPI users] default num_procs of round_robin_mapper with cpus-per-proc option

Ralph Castain Thu, 23 Jan 2014 00:07:21 -0500 (EST)

On Jan 22, 2014, at 8:08 PM, tmish...@jcity.maeda.co.jp wrote:

> 
> 
> Thanks, Ralph.
> 
> I have one more question. I'm sorry to ask you many things ...


Not a problem

> 
> Could you tell me the difference between "map-by slot" and "map-by core".
> From my understanding, slot is the synonym of core.

Not really - see below

> But those behaviors
> using openmpi-1.7.4rc2 with the cpus-per-proc option are quite different
> as shown below. I tried to browse the source code but I could not make it
> clear so far.
> 

It is a little subtle, I fear. When you tell us "map-by slot", we assign each 
process to an allocated slot without associating it to any specific cpu or 
core. When we then bind to core (as we do by default), we balance the binding 
across the sockets to improve performance.

When you tell us "map-by core", then we directly associate each process with a 
specific core. So when we bind, we bind you to that core. This will cause us to 
fully use all the cores on the first socket before we move to the next.

I'm a little puzzled by your output as it appears that cpus-per-proc was 
ignored, so that's something I'd have to look at more carefully. Best guess is 
that we aren't skipping cores to account for the cpus-per-core setting, and 
thus the procs are being mapped to consecutive cores - which wouldn't be very 
good if we then bound them to multiple neighboring cores as they'd fall on top 
of each other.


> Regards,
> Tetsuya Mishima
> 
> [ un-managed environment] (node05,06 has 8 cores each)
> 
> [mishima@manage work]$ cat pbs_hosts
> node05
> node05
> node05
> node05
> node05
> node05
> node05
> node05
> node06
> node06
> node06
> node06
> node06
> node06
> node06
> node06
> [mishima@manage work]$ mpirun -np 4 -hostfile pbs_hosts -report-bindings
> -cpus-per-proc 4 -map-by slot ~/mis/openmpi/dem
> os/myprog
> [node05.cluster:23949] MCW rank 1 bound to socket 1[core 4[hwt 0]], socket
> 1[core 5[hwt 0]], socket 1[core 6[hwt 0]], so
> cket 1[core 7[hwt 0]]: [./././.][B/B/B/B]
> [node05.cluster:23949] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket
> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
> cket 0[core 3[hwt 0]]: [B/B/B/B][./././.]
> [node06.cluster:22139] MCW rank 3 bound to socket 1[core 4[hwt 0]], socket
> 1[core 5[hwt 0]], socket 1[core 6[hwt 0]], so
> cket 1[core 7[hwt 0]]: [./././.][B/B/B/B]
> [node06.cluster:22139] MCW rank 2 bound to socket 0[core 0[hwt 0]], socket
> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
> cket 0[core 3[hwt 0]]: [B/B/B/B][./././.]
> Hello world from process 0 of 4
> Hello world from process 1 of 4
> Hello world from process 3 of 4
> Hello world from process 2 of 4
> [mishima@manage work]$ mpirun -np 4 -hostfile pbs_hosts -report-bindings
> -cpus-per-proc 4 -map-by core ~/mis/openmpi/dem
> os/myprog
> [node05.cluster:23985] MCW rank 1 bound to socket 0[core 1[hwt 0]]:
> [./B/./.][./././.]
> [node05.cluster:23985] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
> [B/././.][./././.]
> [node06.cluster:22175] MCW rank 3 bound to socket 0[core 1[hwt 0]]:
> [./B/./.][./././.]
> [node06.cluster:22175] MCW rank 2 bound to socket 0[core 0[hwt 0]]:
> [B/././.][./././.]
> Hello world from process 2 of 4
> Hello world from process 3 of 4
> Hello world from process 0 of 4
> Hello world from process 1 of 4
> 
> (note) I have the same behavior in the managed environment by Torque
> 
>> Seems like a reasonable, minimal risk request - will do
>> 
>> On Jan 22, 2014, at 4:28 PM, tmish...@jcity.maeda.co.jp wrote:
>> 
>>> 
>>> Hi Ralph, I want to ask you one more thing about default setting of
>>> num_procs
>>> when we don't specify the -np option and we set the cpus-per-proc > 1.
>>> 
>>> In this case, the round_robin_mapper sets num_procs = num_slots as
> below:
>>> 
>>> rmaps_rr.c:
>>> 130        if (0 == app->num_procs) {
>>> 131            /* set the num_procs to equal the number of slots on
> these
>>> mapped nodes */
>>> 132            app->num_procs = num_slots;
>>> 133        }
>>> 
>>> However, because of cpus_per_rank > 1, this num_procs will be refused
> at
>>> the
>>> line 61 in rmaps_rr_mappers.c as below, unless we switch on the
>>> oversubscribe
>>> directive.
>>> 
>>> rmaps_rr_mappers.c:
>>> 61    if (num_slots < ((int)app->num_procs *
>>> orte_rmaps_base.cpus_per_rank)) {
>>> 62        if (ORTE_MAPPING_NO_OVERSUBSCRIBE &
> ORTE_GET_MAPPING_DIRECTIVE
>>> (jdata->map->mapping)) {
>>> 63            orte_show_help("help-orte-rmaps-base.txt",
>>> "orte-rmaps-base:alloc-error",
>>> 64                           true, app->num_procs, app->app);
>>> 65            return ORTE_ERR_SILENT;
>>> 66        }
>>> 67    }
>>> 
>>> Therefore, I think the default num_procs should be equal to the number
> of
>>> num_slots divided by cpus/rank:
>>> 
>>>          app->num_procs = num_slots / orte_rmaps_base.cpus_per_rank;
>>> 
>>> This would be more convinient for most of people who want to use the
>>> -cpus-per-proc option. I already confirmed it worked well. Please
> consider
>>> to apply this fix to 1.7.4.
>>> 
>>> Regards,
>>> Tetsuya Mishima
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] default num_procs of round_robin_mapper with cpus-per-proc option

Reply via email to