Hi, I installed openmpi-1.9a1r29097 on "openSuSE Linux 12.1", "Solaris 10 x86_64", and "Solaris 10 sparc" with "Sun C 5.12" in 64-bit mode. Unfortunately I still have a problem with rankfiles. I reported the problems already in May. I show the problems with Linux, although I have the same problems on all Solaris machines as well.
linpc1 rankfiles 99 cat rf_linpc1 # mpiexec -report-bindings -rf rf_linpc1 hostname rank 0=linpc1 slot=0:0-1,1:0-1 linpc1 rankfiles 100 mpiexec -report-bindings -rf rf_linpc1 hostname [linpc1:23413] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]]: [B/B][./.] linpc1 linpc1 rankfiles 101 cat rf_ex_linpc # mpiexec -report-bindings -rf rf_ex_linpc hostname rank 0=linpc0 slot=0:0-1,1:0-1 rank 1=linpc1 slot=0:0-1 rank 2=linpc1 slot=1:0 rank 3=linpc1 slot=1:1 linpc1 rankfiles 102 mpiexec -report-bindings -rf rf_ex_linpc hostname -------------------------------------------------------------------------- The rankfile that was used claimed that a host was either not allocated or oversubscribed its slots. Please review your rank-slot assignments and your host allocation to ensure a proper match. Also, some systems may require using full hostnames, such as "host1.example.com" (instead of just plain "host1"). Host: linpc0 -------------------------------------------------------------------------- linpc1 rankfiles 103 I don't have these problems with openmpi-1.6.5a1r28554. linpc1 rankfiles 95 ompi_info | grep "Open MPI:" Open MPI: 1.6.5a1r28554 linpc1 rankfiles 95 mpiexec -report-bindings -rf rf_linpc1 hostname [linpc1:23583] MCW rank 0 bound to socket 0[core 0-1] socket 1[core 0-1]: [B B][B B] (slot list 0:0-1,1:0-1) linpc1 linpc1 rankfiles 96 mpiexec -report-bindings -rf rf_ex_linpc hostname [linpc1:23585] MCW rank 1 bound to socket 0[core 0-1]: [B B][. .] (slot list 0:0-1) [linpc1:23585] MCW rank 2 bound to socket 1[core 0]: [. .][B .] (slot list 1:0) [linpc1:23585] MCW rank 3 bound to socket 1[core 1]: [. .][. B] (slot list 1:1) linpc1 linpc1 linpc1 [linpc0:10422] MCW rank 0 bound to socket 0[core 0-1] socket 1[core 0-1]: [B B][B B] (slot list 0:0-1,1:0-1) linpc0 I would be grateful, if somebody can fix the problem. Thank you very much for any help in advance. Kind regards Siegmar