Hi, I installed openmpi-1.6.2 on our heterogeneous platform (Solaris 10 Sparc, Solaris 10 x86_84, and Linux x86_64).
tyr small_prog 125 mpiexec -report-bindings -np 4 -host sunpc0,sunpc1 \ -bysocket -bind-to-core date Mon Oct 1 07:53:15 CEST 2012 [sunpc0:02084] MCW rank 0 bound to socket 0[core 0]: [B .][. .] [sunpc0:02084] MCW rank 2 bound to socket 1[core 0]: [. .][B .] Mon Oct 1 07:53:15 CEST 2012 Mon Oct 1 07:53:15 CEST 2012 [sunpc1:21881] MCW rank 1 bound to socket 0[core 0]: [B .][. .] Mon Oct 1 07:53:15 CEST 2012 [sunpc1:21881] MCW rank 3 bound to socket 1[core 0]: [. .][B .] Now I try to do the same thing with the following rankfile. rank 0=sunpc0.informatik.hs-fulda.de slot=0:0 rank 1=sunpc1.informatik.hs-fulda.de slot=0:0 rank 2=sunpc0.informatik.hs-fulda.de slot=1:0 rank 3=sunpc1.informatik.hs-fulda.de slot=1:0 tyr small_prog 126 mpiexec -report-bindings -rf rf_date_1.openmpi date -------------------------------------------------------------------------- All nodes which are allocated for this job are already filled. -------------------------------------------------------------------------- I can also run the following commands successfully, but fail with the same error message when I use an equivalent rankfile. mpiexec -report-bindings -np 4 -host sunpc0,sunpc1 -bycore \ -bind-to-socket date mpiexec -report-bindings -np 10 -host linpc0,linpc1,sunpc0,sunpc1,tyr \ -byslot -bind-to-core date Do you have any ideas why it doesn't work with a rankfile? Can I provide more information so that you can track down and solve the problem? I still have problems with our Sun M4000 server (two hardware threads per core so that I should use "-bind-to hwthread"). tyr small_prog 133 mpiexec -report-bindings -np 2 -host rs0 -byslot \ -bind-to-core date -------------------------------------------------------------------------- An attempt to set processor affinity has failed - please check to ensure that your system supports such functionality. If so, then this is probably something that should be reported to the OMPI developers. -------------------------------------------------------------------------- [rs0....:23147] MCW rank 0 bound to socket 0[core 0]: [B . . .][. . . .] -------------------------------------------------------------------------- mpiexec was unable to start the specified application as it encountered an error: Error name: Resource temporarily unavailable Node: rs0 when attempting to start process rank 0. -------------------------------------------------------------------------- 2 total processes failed to start I would be grateful if there is some kind of solution for this machine as well in the (near) future. Thank you very much for any help in advance. Kind regards Siegmar