"restore" in what sense? On Feb 27, 2014, at 4:10 PM, tmish...@jcity.maeda.co.jp wrote:
> > > Hi Ralph, this is just for your information. > > I tried to restore previous orte_rmaps_rr_byobj. Then I gets the result > below with this command line: > > mpirun -np 8 -host node05,node06 -report-bindings -map-by socket:pe=2 > -display-map -bind-to core:overload-allowed ~/mis/openmpi/demos/myprog > Data for JOB [31184,1] offset 0 > > ======================== JOB MAP ======================== > > Data for node: node05 Num slots: 1 Max slots: 0 Num procs: 7 > Process OMPI jobid: [31184,1] App: 0 Process rank: 0 > Process OMPI jobid: [31184,1] App: 0 Process rank: 2 > Process OMPI jobid: [31184,1] App: 0 Process rank: 4 > Process OMPI jobid: [31184,1] App: 0 Process rank: 6 > Process OMPI jobid: [31184,1] App: 0 Process rank: 1 > Process OMPI jobid: [31184,1] App: 0 Process rank: 3 > Process OMPI jobid: [31184,1] App: 0 Process rank: 5 > > Data for node: node06 Num slots: 1 Max slots: 0 Num procs: 1 > Process OMPI jobid: [31184,1] App: 0 Process rank: 7 > > ============================================================= > [node06.cluster:18857] MCW rank 7 bound to socket 0[core 0[hwt 0]], socket > 0[core 1[hwt 0]]: [B/B/./.][./././.] > [node05.cluster:21399] MCW rank 3 bound to socket 1[core 6[hwt 0]], socket > 1[core 7[hwt 0]]: [./././.][././B/B] > [node05.cluster:21399] MCW rank 4 bound to socket 0[core 0[hwt 0]], socket > 0[core 1[hwt 0]]: [B/B/./.][./././.] > [node05.cluster:21399] MCW rank 5 bound to socket 1[core 4[hwt 0]], socket > 1[core 5[hwt 0]]: [./././.][B/B/./.] > [node05.cluster:21399] MCW rank 6 bound to socket 0[core 2[hwt 0]], socket > 0[core 3[hwt 0]]: [././B/B][./././.] > [node05.cluster:21399] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket > 0[core 1[hwt 0]]: [B/B/./.][./././.] > [node05.cluster:21399] MCW rank 1 bound to socket 1[core 4[hwt 0]], socket > 1[core 5[hwt 0]]: [./././.][B/B/./.] > [node05.cluster:21399] MCW rank 2 bound to socket 0[core 2[hwt 0]], socket > 0[core 3[hwt 0]]: [././B/B][./././.] > .... > > > Then I add "-hostfile pbs_hosts" and the result is: > > [mishima@manage work]$cat pbs_hosts > node05 slots=8 > node06 slots=8 > [mishima@manage work]$ mpirun -np 8 -hostfile ~/work/pbs_hosts > -report-bindings -map-by socket:pe=2 -display-map > ~/mis/openmpi/demos/myprog > Data for JOB [30254,1] offset 0 > > ======================== JOB MAP ======================== > > Data for node: node05 Num slots: 8 Max slots: 0 Num procs: 4 > Process OMPI jobid: [30254,1] App: 0 Process rank: 0 > Process OMPI jobid: [30254,1] App: 0 Process rank: 2 > Process OMPI jobid: [30254,1] App: 0 Process rank: 1 > Process OMPI jobid: [30254,1] App: 0 Process rank: 3 > > Data for node: node06 Num slots: 8 Max slots: 0 Num procs: 4 > Process OMPI jobid: [30254,1] App: 0 Process rank: 4 > Process OMPI jobid: [30254,1] App: 0 Process rank: 6 > Process OMPI jobid: [30254,1] App: 0 Process rank: 5 > Process OMPI jobid: [30254,1] App: 0 Process rank: 7 > > ============================================================= > [node05.cluster:21501] MCW rank 2 bound to socket 0[core 2[hwt 0]], socket > 0[core 3[hwt 0]]: [././B/B][./././.] > [node05.cluster:21501] MCW rank 3 bound to socket 1[core 6[hwt 0]], socket > 1[core 7[hwt 0]]: [./././.][././B/B] > [node05.cluster:21501] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket > 0[core 1[hwt 0]]: [B/B/./.][./././.] > [node05.cluster:21501] MCW rank 1 bound to socket 1[core 4[hwt 0]], socket > 1[core 5[hwt 0]]: [./././.][B/B/./.] > [node06.cluster:18935] MCW rank 6 bound to socket 0[core 2[hwt 0]], socket > 0[core 3[hwt 0]]: [././B/B][./././.] > [node06.cluster:18935] MCW rank 7 bound to socket 1[core 6[hwt 0]], socket > 1[core 7[hwt 0]]: [./././.][././B/B] > [node06.cluster:18935] MCW rank 4 bound to socket 0[core 0[hwt 0]], socket > 0[core 1[hwt 0]]: [B/B/./.][./././.] > [node06.cluster:18935] MCW rank 5 bound to socket 1[core 4[hwt 0]], socket > 1[core 5[hwt 0]]: [./././.][B/B/./.] > .... > > > I think previous version's behavior would be close to what I expect. > > Tetusya > >> They have 4 cores/socket and 2 sockets, totally 4 X 2 = 8 cores, each. >> >> Here is the output of lstopo. >> >> mishima@manage round_robin]$ rsh node05 >> Last login: Tue Feb 18 15:10:15 from manage >> [mishima@node05 ~]$ lstopo >> Machine (32GB) >> NUMANode L#0 (P#0 16GB) + Socket L#0 + L3 L#0 (6144KB) >> L2 L#0 (512KB) + L1d L#0 (64KB) + L1i L#0 (64KB) + Core L#0 + PU L#0 >> (P#0) >> L2 L#1 (512KB) + L1d L#1 (64KB) + L1i L#1 (64KB) + Core L#1 + PU L#1 >> (P#1) >> L2 L#2 (512KB) + L1d L#2 (64KB) + L1i L#2 (64KB) + Core L#2 + PU L#2 >> (P#2) >> L2 L#3 (512KB) + L1d L#3 (64KB) + L1i L#3 (64KB) + Core L#3 + PU L#3 >> (P#3) >> NUMANode L#1 (P#1 16GB) + Socket L#1 + L3 L#1 (6144KB) >> L2 L#4 (512KB) + L1d L#4 (64KB) + L1i L#4 (64KB) + Core L#4 + PU L#4 >> (P#4) >> L2 L#5 (512KB) + L1d L#5 (64KB) + L1i L#5 (64KB) + Core L#5 + PU L#5 >> (P#5) >> L2 L#6 (512KB) + L1d L#6 (64KB) + L1i L#6 (64KB) + Core L#6 + PU L#6 >> (P#6) >> L2 L#7 (512KB) + L1d L#7 (64KB) + L1i L#7 (64KB) + Core L#7 + PU L#7 >> (P#7) >> .... >> >> I foucused on byobj_span and bynode. I didn't notice byobj was modified, >> sorry. >> >> Tetsuya >> >>> Hmmm..what does your node look like again (sockets and cores)? >>> >>> On Feb 27, 2014, at 3:19 PM, tmish...@jcity.maeda.co.jp wrote: >>> >>>> >>>> Hi Ralph, I'm afraid to say your new "map-by obj" causes another >> problem. >>>> >>>> I have overload message with this command line as shown below: >>>> >>>> mpirun -np 8 -host node05,node06 -report-bindings -map-by socket:pe=2 >>>> -display-map ~/mis/openmpi/d >>>> emos/myprog >>>> >> > -------------------------------------------------------------------------- >>>> A request was made to bind to that would result in binding more >>>> processes than cpus on a resource: >>>> >>>> Bind to: CORE >>>> Node: node05 >>>> #processes: 2 >>>> #cpus: 1 >>>> >>>> You can override this protection by adding the "overload-allowed" >>>> option to your binding directive. >>>> >> > -------------------------------------------------------------------------- >>>> >>>> Then, I add "-bind-to core:overload-allowed" to see what happenes. >>>> >>>> mpirun -np 8 -host node05,node06 -report-bindings -map-by socket:pe=2 >>>> -display-map -bind-to core:o >>>> verload-allowed ~/mis/openmpi/demos/myprog >>>> Data for JOB [14398,1] offset 0 >>>> >>>> ======================== JOB MAP ======================== >>>> >>>> Data for node: node05 Num slots: 1 Max slots: 0 Num procs: 4 >>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 0 >>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 1 >>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 2 >>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 3 >>>> >>>> Data for node: node06 Num slots: 1 Max slots: 0 Num procs: 4 >>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 4 >>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 5 >>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 6 >>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 7 >>>> >>>> ============================================================= >>>> [node06.cluster:18443] MCW rank 6 bound to socket 0[core 0[hwt 0]], >> socket >>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] >>>> [node05.cluster:20901] MCW rank 2 bound to socket 0[core 0[hwt 0]], >> socket >>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] >>>> [node06.cluster:18443] MCW rank 7 bound to socket 0[core 2[hwt 0]], >> socket >>>> 0[core 3[hwt 0]]: [././B/B][./././.] >>>> [node05.cluster:20901] MCW rank 3 bound to socket 0[core 2[hwt 0]], >> socket >>>> 0[core 3[hwt 0]]: [././B/B][./././.] >>>> [node06.cluster:18443] MCW rank 4 bound to socket 0[core 0[hwt 0]], >> socket >>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] >>>> [node05.cluster:20901] MCW rank 0 bound to socket 0[core 0[hwt 0]], >> socket >>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] >>>> [node06.cluster:18443] MCW rank 5 bound to socket 0[core 2[hwt 0]], >> socket >>>> 0[core 3[hwt 0]]: [././B/B][./././.] >>>> [node05.cluster:20901] MCW rank 1 bound to socket 0[core 2[hwt 0]], >> socket >>>> 0[core 3[hwt 0]]: [././B/B][./././.] >>>> Hello world from process 4 of 8 >>>> Hello world from process 2 of 8 >>>> Hello world from process 6 of 8 >>>> Hello world from process 0 of 8 >>>> Hello world from process 5 of 8 >>>> Hello world from process 1 of 8 >>>> Hello world from process 7 of 8 >>>> Hello world from process 3 of 8 >>>> >>>> When I add "map-by obj:span", it works fine: >>>> >>>> mpirun -np 8 -host node05,node06 -report-bindings -map-by >> socket:pe=2,span >>>> -display-map ~/mis/ope >>>> nmpi/demos/myprog >>>> Data for JOB [14703,1] offset 0 >>>> >>>> ======================== JOB MAP ======================== >>>> >>>> Data for node: node05 Num slots: 1 Max slots: 0 Num procs: 4 >>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 0 >>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 2 >>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 1 >>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 3 >>>> >>>> Data for node: node06 Num slots: 1 Max slots: 0 Num procs: 4 >>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 4 >>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 6 >>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 5 >>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 7 >>>> >>>> ============================================================= >>>> [node06.cluster:18491] MCW rank 6 bound to socket 0[core 2[hwt 0]], >> socket >>>> 0[core 3[hwt 0]]: [././B/B][./././.] >>>> [node05.cluster:20949] MCW rank 2 bound to socket 0[core 2[hwt 0]], >> socket >>>> 0[core 3[hwt 0]]: [././B/B][./././.] >>>> [node06.cluster:18491] MCW rank 7 bound to socket 1[core 6[hwt 0]], >> socket >>>> 1[core 7[hwt 0]]: [./././.][././B/B] >>>> [node05.cluster:20949] MCW rank 3 bound to socket 1[core 6[hwt 0]], >> socket >>>> 1[core 7[hwt 0]]: [./././.][././B/B] >>>> [node06.cluster:18491] MCW rank 4 bound to socket 0[core 0[hwt 0]], >> socket >>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] >>>> [node05.cluster:20949] MCW rank 0 bound to socket 0[core 0[hwt 0]], >> socket >>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] >>>> [node06.cluster:18491] MCW rank 5 bound to socket 1[core 4[hwt 0]], >> socket >>>> 1[core 5[hwt 0]]: [./././.][B/B/./.] >>>> [node05.cluster:20949] MCW rank 1 bound to socket 1[core 4[hwt 0]], >> socket >>>> 1[core 5[hwt 0]]: [./././.][B/B/./.] >>>> .... >>>> >>>> So, byobj_span would be okay. Of course, bynode and byslot should be >> okay. >>>> Could you take a look at orte_rmaps_rr_byobj again? >>>> >>>> Regards, >>>> Tetsuya Mishima >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users