Thanks, Ralph.
I have one more question. I'm sorry to ask you many things ... Could you tell me the difference between "map-by slot" and "map-by core". >From my understanding, slot is the synonym of core. But those behaviors using openmpi-1.7.4rc2 with the cpus-per-proc option are quite different as shown below. I tried to browse the source code but I could not make it clear so far. Regards, Tetsuya Mishima [ un-managed environment] (node05,06 has 8 cores each) [mishima@manage work]$ cat pbs_hosts node05 node05 node05 node05 node05 node05 node05 node05 node06 node06 node06 node06 node06 node06 node06 node06 [mishima@manage work]$ mpirun -np 4 -hostfile pbs_hosts -report-bindings -cpus-per-proc 4 -map-by slot ~/mis/openmpi/dem os/myprog [node05.cluster:23949] MCW rank 1 bound to socket 1[core 4[hwt 0]], socket 1[core 5[hwt 0]], socket 1[core 6[hwt 0]], so cket 1[core 7[hwt 0]]: [./././.][B/B/B/B] [node05.cluster:23949] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so cket 0[core 3[hwt 0]]: [B/B/B/B][./././.] [node06.cluster:22139] MCW rank 3 bound to socket 1[core 4[hwt 0]], socket 1[core 5[hwt 0]], socket 1[core 6[hwt 0]], so cket 1[core 7[hwt 0]]: [./././.][B/B/B/B] [node06.cluster:22139] MCW rank 2 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so cket 0[core 3[hwt 0]]: [B/B/B/B][./././.] Hello world from process 0 of 4 Hello world from process 1 of 4 Hello world from process 3 of 4 Hello world from process 2 of 4 [mishima@manage work]$ mpirun -np 4 -hostfile pbs_hosts -report-bindings -cpus-per-proc 4 -map-by core ~/mis/openmpi/dem os/myprog [node05.cluster:23985] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./.][./././.] [node05.cluster:23985] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././.][./././.] [node06.cluster:22175] MCW rank 3 bound to socket 0[core 1[hwt 0]]: [./B/./.][./././.] [node06.cluster:22175] MCW rank 2 bound to socket 0[core 0[hwt 0]]: [B/././.][./././.] Hello world from process 2 of 4 Hello world from process 3 of 4 Hello world from process 0 of 4 Hello world from process 1 of 4 (note) I have the same behavior in the managed environment by Torque > Seems like a reasonable, minimal risk request - will do > > On Jan 22, 2014, at 4:28 PM, tmish...@jcity.maeda.co.jp wrote: > > > > > Hi Ralph, I want to ask you one more thing about default setting of > > num_procs > > when we don't specify the -np option and we set the cpus-per-proc > 1. > > > > In this case, the round_robin_mapper sets num_procs = num_slots as below: > > > > rmaps_rr.c: > > 130 if (0 == app->num_procs) { > > 131 /* set the num_procs to equal the number of slots on these > > mapped nodes */ > > 132 app->num_procs = num_slots; > > 133 } > > > > However, because of cpus_per_rank > 1, this num_procs will be refused at > > the > > line 61 in rmaps_rr_mappers.c as below, unless we switch on the > > oversubscribe > > directive. > > > > rmaps_rr_mappers.c: > > 61 if (num_slots < ((int)app->num_procs * > > orte_rmaps_base.cpus_per_rank)) { > > 62 if (ORTE_MAPPING_NO_OVERSUBSCRIBE & ORTE_GET_MAPPING_DIRECTIVE > > (jdata->map->mapping)) { > > 63 orte_show_help("help-orte-rmaps-base.txt", > > "orte-rmaps-base:alloc-error", > > 64 true, app->num_procs, app->app); > > 65 return ORTE_ERR_SILENT; > > 66 } > > 67 } > > > > Therefore, I think the default num_procs should be equal to the number of > > num_slots divided by cpus/rank: > > > > app->num_procs = num_slots / orte_rmaps_base.cpus_per_rank; > > > > This would be more convinient for most of people who want to use the > > -cpus-per-proc option. I already confirmed it worked well. Please consider > > to apply this fix to 1.7.4. > > > > Regards, > > Tetsuya Mishima > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users