Hi, yesterday I installed openmpi-1.8.2rc4r32485 on my machines (Solaris 10 Sparc (tyr), Solaris 10 x86_64 (sunpc0, sunpc1), openSUSE Linux 12.1 x86_64 (linpc0, linpc1)) with Sun C 5.12. Today I was playing around a little bit more with rankfiles and found the following things which may be helpful tracking down the error. I use variations of the following rankfile (I remove a line and adapt ranks). Many rankfiles work fine and a few break.
tyr openmpi_1.7.x_or_newer 180 cat x-linpc0_linpc1_sunpc1_tyr rank 0=linpc0 slot=0:0-1;1:0-1 rank 1=linpc1 slot=1:0 rank 2=sunpc1 slot=1:0 rank 3=tyr slot=1:0 The above rankfile still breaks.. tyr openmpi_1.7.x_or_newer 186 mpiexec --report-bindings -np 4 -rf x-linpc0_linpc1_sunpc1_tyr hostname -------------------------------------------------------------------------- An invalid value was supplied for an enum variable. Variable : hwloc_base_report_bindings Value : 1,1 Valid values : 0: f|false|disabled, 1: t|true|enabled -------------------------------------------------------------------------- [tyr.informatik.hs-fulda.de:21651] MCW rank 3 bound to socket 1[core 1[hwt 0]]: [.][B] tyr.informatik.hs-fulda.de [linpc0:21338] MCW rank 0 is not bound (or bound to all available processors) [linpc1:16906] MCW rank 1 bound to socket 1[core 2[hwt 0]]: [./.][B/.] linpc0 linpc1 sunpc1 tyr openmpi_1.7.x_or_newer 187 tyr openmpi_1.7.x_or_newer 191 mpiexec --report-bindings -np 3 -rf x-linpc0_linpc1_tyr hostname [tyr.informatik.hs-fulda.de:21685] MCW rank 2 bound to socket 1[core 1[hwt 0]]: [.][B] tyr.informatik.hs-fulda.de [linpc0:21607] MCW rank 0 is not bound (or bound to all available processors) linpc0 [linpc1:17168] MCW rank 1 bound to socket 1[core 2[hwt 0]]: [./.][B/.] linpc1 tyr openmpi_1.7.x_or_newer 192 tyr openmpi_1.7.x_or_newer 193 mpiexec --report-bindings -np 3 -rf x-linpc0_sunpc1_tyr hostname [tyr.informatik.hs-fulda.de:21695] MCW rank 2 bound to socket 1[core 1[hwt 0]]: [.][B] tyr.informatik.hs-fulda.de [linpc0:21673] MCW rank 0 is not bound (or bound to all available processors) linpc0 [sunpc1:25457] MCW rank 1 bound to socket 1[core 2[hwt 0]]: [./.][B/.] sunpc1 tyr openmpi_1.7.x_or_newer 194 tyr openmpi_1.7.x_or_newer 195 mpiexec --report-bindings -np 3 -rf x-linpc0_linpc1_sunpc1 hostname -------------------------------------------------------------------------- An invalid value was supplied for an enum variable. Variable : hwloc_base_report_bindings Value : 1,1 Valid values : 0: f|false|disabled, 1: t|true|enabled -------------------------------------------------------------------------- [linpc0:21743] MCW rank 0 is not bound (or bound to all available processors) [linpc1:17240] MCW rank 1 bound to socket 1[core 2[hwt 0]]: [./.][B/.] linpc1 linpc0 sunpc1 tyr openmpi_1.7.x_or_newer 196 tyr openmpi_1.7.x_or_newer 197 mpiexec --report-bindings -np 2 -rf x-linpc0_sunpc1 hostname [linpc0:21836] MCW rank 0 is not bound (or bound to all available processors) linpc0 [sunpc1:25521] MCW rank 1 bound to socket 1[core 2[hwt 0]]: [./.][B/.] sunpc1 tyr openmpi_1.7.x_or_newer 198 tyr openmpi_1.7.x_or_newer 199 mpiexec --report-bindings -np 2 -rf x-linpc1_sunpc1 hostname [linpc1:17335] MCW rank 0 bound to socket 1[core 2[hwt 0]]: [./.][B/.] linpc1 [sunpc1:25583] MCW rank 1 bound to socket 1[core 2[hwt 0]]: [./.][B/.] sunpc1 tyr openmpi_1.7.x_or_newer 200 I would be grateful if somebody can fix the problem. Can I provide anything else? Thank you very much any help in advance. Kind regards Siegmar