Hi, yesterday I installed openmpi-1.7a1r27358 and it has an improved error message compared to openmpi-1.6.2, but doesn't show process bindings and has some other problems as well.
"sunpc0" and "linpc0" are equipped with two dual-core processors running Solaris 10 x86_64 and Linux x86_64 resp. "tyr" is a dual-processor machine running Solaris 10 Sparc. tyr fd1026 105 mpiexec -np 2 -host sunpc0 -report-bindings \ -map-by core -bind-to-core date Sun Sep 23 11:46:36 CEST 2012 Sun Sep 23 11:46:36 CEST 2012 tyr fd1026 106 mpicc -showme cc -I/usr/local/openmpi-1.7_64_cc/include -mt -m64 -L/usr/local/openmpi-1.7_64_cc/lib64 -lmpi -lpicl -lm -lkstat -llgrp -lsocket -lnsl -lrt -lm openmpi-1.6.2 shows process bindings. tyr fd1026 103 mpiexec -np 2 -host sunpc0 -report-bindings \ -bycore -bind-to-core date Sun Sep 23 12:09:06 CEST 2012 [sunpc0:13197] MCW rank 0 bound to socket 0[core 0]: [B .][. .] [sunpc0:13197] MCW rank 1 bound to socket 0[core 1]: [. B][. .] Sun Sep 23 12:09:06 CEST 2012 tyr fd1026 104 mpicc -showme cc -I/usr/local/openmpi-1.6.2_64_cc/include -mt -m64 -L/usr/local/openmpi-1.6.2_64_cc/lib64 -lmpi -lm -lkstat -llgrp -lsocket -lnsl -lrt -lm On my Linux machine I get a warning. tyr fd1026 113 mpiexec -np 2 -host linpc0 -report-bindings \ -map-by core -bind-to-core date -------------------------------------------------------------------------- WARNING: a request was made to bind a process. While the system supports binding the process itself, at least one node does NOT support binding memory to the process location. Node: linpc0 This is a warning only; your job will continue, though performance may be degraded. -------------------------------------------------------------------------- Sun Sep 23 11:56:04 CEST 2012 Sun Sep 23 11:56:04 CEST 2012 Everything works fine with openmpi-1.6.2. tyr fd1026 106 mpiexec -np 2 -host linpc0 -report-bindings \ -bycore -bind-to-core date [linpc0:15808] MCW rank 0 bound to socket 0[core 0]: [B .][. .] [linpc0:15808] MCW rank 1 bound to socket 0[core 1]: [. B][. .] Sun Sep 23 12:11:47 CEST 2012 Sun Sep 23 12:11:47 CEST 2012 Om my Solaris Sparc machine I get the following errors. tyr fd1026 121 mpiexec -np 2 -report-bindings -map-by core -bind-to-core date [tyr.informatik.hs-fulda.de:23773] [[32457,0],0] ORTE_ERROR_LOG: Value out of bounds in file ../../../../openmpi-1.7a1r27358/orte/mca/odls/base/odls_base_default_fns.c at line 847 [tyr.informatik.hs-fulda.de:23773] [[32457,0],0] ORTE_ERROR_LOG: Value out of bounds in file ../../../../openmpi-1.7a1r27358/orte/mca/odls/base/odls_base_default_fns.c at line 1414 [tyr.informatik.hs-fulda.de:23773] [[32457,0],0] ORTE_ERROR_LOG: Value out of bounds in file ../../../../openmpi-1.7a1r27358/orte/mca/odls/base/odls_base_default_fns.c at line 847 [tyr.informatik.hs-fulda.de:23773] [[32457,0],0] ORTE_ERROR_LOG: Value out of bounds in file ../../../../openmpi-1.7a1r27358/orte/mca/odls/base/odls_base_default_fns.c at line 1414 tyr fd1026 122 mpiexec -np 2 -host tyr -report-bindings -map-by core -bind-to core date -------------------------------------------------------------------------- All nodes which are allocated for this job are already filled. -------------------------------------------------------------------------- Once more everything works fine with openmpi-1.6.2. tyr fd1026 109 mpiexec -np 2 -report-bindings -bycore -bind-to-core date [tyr.informatik.hs-fulda.de:23869] MCW rank 0 bound to socket 0[core 0]: [B][.] [tyr.informatik.hs-fulda.de:23869] MCW rank 1 bound to socket 1[core 0]: [.][B] Sun Sep 23 12:14:09 CEST 2012 Sun Sep 23 12:14:09 CEST 2012 tyr fd1026 110 mpiexec -np 2 -host tyr -report-bindings -bycore -bind-to-core date [tyr.informatik.hs-fulda.de:23877] MCW rank 0 bound to socket 0[core 0]: [B][.] [tyr.informatik.hs-fulda.de:23877] MCW rank 1 bound to socket 1[core 0]: [.][B] Sun Sep 23 12:16:05 CEST 2012 Sun Sep 23 12:16:05 CEST 2012 Kind regards Siegmar