Hi Ralph, I did overall verification of rr_mapper, and I found another problem with "map-by node". As far as I checked, "map-by obj" other than node worked fine. I myself do not use "map-by node", but I'd like to report it to improve reliability of 1.7.5. It seems too difficult for me to resolve it. I hope you could take a look.
The problem occurs when I mixedly use two kinds of node, although I add "-hetero-nodes" to command line: [mishima@manage work]$ cat pbs_hosts node04 slots=8 node05 slots=2 node06 slots=2 [mishima@manage work]$ mpirun -np 12 -machinefile pbs_hosts -map-by node -report-bindings -hetero-nodes /home/mishima/mi s/openmpi/demos/myprog [manage.cluster:13113] [[15682,0],0] ORTE_ERROR_LOG: Fatal in file rmaps_rr.c at line 241 [manage.cluster:13113] [[15682,0],0] ORTE_ERROR_LOG: Fatal in file base/rmaps_base_map_job.c at line 285 With "-np 11", it works. But rank 10 is bound to the wrong core (which is already used by rank 0). I guess something is wrong with the handling of different topology when "map-by node" is specified. In addition, the calculation of assigning procs to each node has some problems: [mishima@manage work]$ mpirun -np 11 -machinefile pbs_hosts -map-by node -report-bindings -hetero-nodes /home/mishima/mi s/openmpi/demos/myprog [node04.cluster:13384] MCW rank 3 bound to socket 0[core 1[hwt 0]]: [./B/./././././.][./././././././.][./././././././.][ ./././././././.] [node04.cluster:13384] MCW rank 6 bound to socket 0[core 2[hwt 0]]: [././B/././././.][./././././././.][./././././././.][ ./././././././.] [node04.cluster:13384] MCW rank 8 bound to socket 0[core 3[hwt 0]]: [./././B/./././.][./././././././.][./././././././.][ ./././././././.] [node04.cluster:13384] MCW rank 10 bound to socket 0[core 0[hwt 0]]: [B/././././././.][./././././././.][./././././././.] [./././././././.] [node04.cluster:13384] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././.][./././././././.][./././././././.][ ./././././././.] [node06.cluster:24192] MCW rank 5 bound to socket 0[core 1[hwt 0]]: [./B/./.][./././.] [node06.cluster:24192] MCW rank 2 bound to socket 0[core 0[hwt 0]]: [B/././.][./././.] [node05.cluster:25655] MCW rank 9 bound to socket 0[core 3[hwt 0]]: [./././B][./././.] [node05.cluster:25655] MCW rank 1 bound to socket 0[core 0[hwt 0]]: [B/././.][./././.] [node05.cluster:25655] MCW rank 4 bound to socket 0[core 1[hwt 0]]: [./B/./.][./././.] [node05.cluster:25655] MCW rank 7 bound to socket 0[core 2[hwt 0]]: [././B/.][./././.] Hello world from process 4 of 11 Hello world from process 7 of 11 Hello world from process 6 of 11 Hello world from process 3 of 11 Hello world from process 0 of 11 Hello world from process 8 of 11 Hello world from process 2 of 11 Hello world from process 5 of 11 Hello world from process 9 of 11 Hello world from process 1 of 11 Hello world from process 10 of 11 Regards, Tetsuya Mishima