Hi thank you very much for your answer. I have compiled your program and get different behaviours for openmpi-1.6.4rc3 and openmpi-1.9.
> On 02/05/13 00:30, Siegmar Gross wrote: > > > > now I can use all our machines once more. I have a problem on > > Solaris 10 x86_64, because the mapping of processes doesn't > > correspond to the rankfile. I removed the output from "hostfile" > > and wrapped around long lines. > > > > tyr rankfiles 114 cat rf_ex_sunpc > > # mpiexec -report-bindings -rf rf_ex_sunpc hostname > > > > rank 0=sunpc0 slot=0:0-1,1:0-1 > > rank 1=sunpc1 slot=0:0-1 > > rank 2=sunpc1 slot=1:0 > > rank 3=sunpc1 slot=1:1 > > > > > > tyr rankfiles 115 mpiexec -report-bindings -rf rf_ex_sunpc hostname > > [sunpc0:17920] MCW rank 0 bound to socket 0[core 0-1] > > socket 1[core 0-1]: [B B][B B] (slot list 0:0-1,1:0-1) > > [sunpc1:11265] MCW rank 1 bound to socket 0[core 0-1]: > > [B B][. .] (slot list 0:0-1) > > [sunpc1:11265] MCW rank 2 bound to socket 0[core 0-1] > > socket 1[core 0-1]: [B B][B B] (slot list 1:0) > > [sunpc1:11265] MCW rank 3 bound to socket 0[core 0-1] > > socket 1[core 0-1]: [B B][B B] (slot list 1:1) > > A few comments. > > First of all, the heterogeneous environment had nothing to do > with this (as you have just confirmed). You can reproduce the problem so: > > % cat myrankfile > rank 0=mynode slot=0:1 > % mpirun --report-bindings --rankfile myrankfile hostname > [mynode:5150] MCW rank 0 bound to socket 0[core 0-3]: > [B B B B] (slot list 0:1) > > Anyhow, that's water under the bridge at this point. > > Next, and you might already know this, you can't bind arbitrarily > on Solaris. You have to bind to a locality group (lgroup) or an > individual core. Sorry if that's repeating something you already > knew. Anyhow, your problem cases are when binding to a single > core. So, you're all right (and OMPI isn't). > > Finally, you can check the actual binding so: > > % cat check.c > #include <sys/types.h> > #include <sys/processor.h> > #include <sys/procset.h> > #include <stdio.h> > > int main(int argc, char **argv) { > processorid_t obind; > if ( processor_bind(P_PID, P_MYID, PBIND_QUERY, &obind) != 0 ) { > printf("ERROR\n"); > } else { > if ( obind == PBIND_NONE ) printf("unbound\n"); > else printf("bind to %d\n", obind); > } > return 0; > } > % cc check.c > % mpirun --report-bindings --rankfile myrankfile ./a.out > > I can reproduce your problem on my Solaris 11 machine (rankfile > specifies a particular core but --report-bindings shows binding to > entire node), but the test problem shows binding to the core I > specified. > > So, the problem is in --report-bindings? I'll poke around some. sunpc1 rankfiles 103 cat myrankfile rank 0=sunpc1 slot=0:1 sunpc1 rankfiles 104 cat myrankfile_0 rank 0=sunpc1 slot=0:0 I get the following output for openmpi-1.6.4rc3 (more or less the same for both rankfiles). sunpc1 rankfiles 105 ompi_info | grep "MPI:" Open MPI: 1.6.4rc3r27923 sunpc1 rankfiles 106 mpirun --report-bindings \ --rankfile myrankfile ./a.out bind to 1 [sunpc1:26472] MCW rank 0 bound to socket 0[core 0-1] socket 1[core 0-1]: [B B][B B] (slot list 0:1) sunpc1 rankfiles 107 mpirun --report-bindings \ --rankfile myrankfile_0 ./a.out [sunpc1:26484] MCW rank 0 bound to socket 0[core 0-1] socket 1[core 0-1]: [B B][B B] (slot list 0:0) bind to 0 I get the following output for openmpi-1.9 (different outputs !!!). sunpc1 rankfiles 103 ompi_info | grep "MPI:" Open MPI: 1.9a1r28035 sunpc1 rankfiles 104 mpirun --report-bindings \ --rankfile myrankfile ./a.out [sunpc1:26554] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]]: [B/B][./.] unbound sunpc1 rankfiles 105 mpirun --report-bindings \ --rankfile myrankfile_0 ./a.out [sunpc1:26557] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/.][./.] bind to 0 sunpc1 rankfiles 107 cd /usr/local/hwloc-1.6.1/bin/ sunpc1 bin 108 ./lstopo Machine (8191MB) NUMANode L#0 (P#1 4095MB) + Socket L#0 Core L#0 + PU L#0 (P#0) Core L#1 + PU L#1 (P#1) NUMANode L#1 (P#2 4096MB) + Socket L#1 Core L#2 + PU L#2 (P#2) Core L#3 + PU L#3 (P#3) Thank you very much for any help in advance. Kind regards Siegmar