Hi,

yesterday I installed openmpi-1.7a1r27358 and it has an improved
error message compared to openmpi-1.6.2, but doesn't show process bindings
and has some other problems as well.


"sunpc0" and "linpc0" are equipped with two dual-core processors running
Solaris 10 x86_64 and Linux x86_64 resp. "tyr" is a dual-processor machine
running Solaris 10 Sparc.

tyr fd1026 105 mpiexec -np 2 -host sunpc0 -report-bindings \
  -map-by core -bind-to-core date
Sun Sep 23 11:46:36 CEST 2012
Sun Sep 23 11:46:36 CEST 2012

tyr fd1026 106 mpicc -showme
cc -I/usr/local/openmpi-1.7_64_cc/include -mt -m64 
  -L/usr/local/openmpi-1.7_64_cc/lib64 -lmpi -lpicl -lm -lkstat -llgrp
  -lsocket -lnsl -lrt -lm


openmpi-1.6.2 shows process bindings.

tyr fd1026 103 mpiexec -np 2 -host sunpc0 -report-bindings \
  -bycore -bind-to-core date
Sun Sep 23 12:09:06 CEST 2012
[sunpc0:13197] MCW rank 0 bound to socket 0[core 0]: [B .][. .]
[sunpc0:13197] MCW rank 1 bound to socket 0[core 1]: [. B][. .]
Sun Sep 23 12:09:06 CEST 2012


tyr fd1026 104 mpicc -showme
cc -I/usr/local/openmpi-1.6.2_64_cc/include -mt -m64
  -L/usr/local/openmpi-1.6.2_64_cc/lib64 -lmpi -lm -lkstat -llgrp
  -lsocket -lnsl -lrt -lm


On my Linux machine I get a warning.

tyr fd1026 113 mpiexec -np 2 -host linpc0 -report-bindings \
  -map-by core -bind-to-core date
--------------------------------------------------------------------------
WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.

  Node:  linpc0

This is a warning only; your job will continue, though performance may
be degraded.
--------------------------------------------------------------------------
Sun Sep 23 11:56:04 CEST 2012
Sun Sep 23 11:56:04 CEST 2012



Everything works fine with openmpi-1.6.2.

tyr fd1026 106 mpiexec -np 2 -host linpc0 -report-bindings \
  -bycore -bind-to-core date
[linpc0:15808] MCW rank 0 bound to socket 0[core 0]: [B .][. .]
[linpc0:15808] MCW rank 1 bound to socket 0[core 1]: [. B][. .]
Sun Sep 23 12:11:47 CEST 2012
Sun Sep 23 12:11:47 CEST 2012




Om my Solaris Sparc machine I get the following errors.


tyr fd1026 121 mpiexec -np 2 -report-bindings -map-by core -bind-to-core date
[tyr.informatik.hs-fulda.de:23773] [[32457,0],0] ORTE_ERROR_LOG: Value out of 
bounds in file 
../../../../openmpi-1.7a1r27358/orte/mca/odls/base/odls_base_default_fns.c at 
line 847
[tyr.informatik.hs-fulda.de:23773] [[32457,0],0] ORTE_ERROR_LOG: Value out of 
bounds in file 
../../../../openmpi-1.7a1r27358/orte/mca/odls/base/odls_base_default_fns.c at 
line 1414
[tyr.informatik.hs-fulda.de:23773] [[32457,0],0] ORTE_ERROR_LOG: Value out of 
bounds in file 
../../../../openmpi-1.7a1r27358/orte/mca/odls/base/odls_base_default_fns.c at 
line 847
[tyr.informatik.hs-fulda.de:23773] [[32457,0],0] ORTE_ERROR_LOG: Value out of 
bounds in file 
../../../../openmpi-1.7a1r27358/orte/mca/odls/base/odls_base_default_fns.c at 
line 1414



tyr fd1026 122 mpiexec -np 2 -host tyr -report-bindings -map-by core -bind-to 
core date
--------------------------------------------------------------------------
All nodes which are allocated for this job are already filled.
--------------------------------------------------------------------------


Once more everything works fine with openmpi-1.6.2.

tyr fd1026 109 mpiexec -np 2 -report-bindings -bycore -bind-to-core date
[tyr.informatik.hs-fulda.de:23869] MCW rank 0 bound to socket 0[core 0]: [B][.]
[tyr.informatik.hs-fulda.de:23869] MCW rank 1 bound to socket 1[core 0]: [.][B]
Sun Sep 23 12:14:09 CEST 2012
Sun Sep 23 12:14:09 CEST 2012

tyr fd1026 110 mpiexec -np 2 -host tyr -report-bindings -bycore -bind-to-core 
date
[tyr.informatik.hs-fulda.de:23877] MCW rank 0 bound to socket 0[core 0]: [B][.]
[tyr.informatik.hs-fulda.de:23877] MCW rank 1 bound to socket 1[core 0]: [.][B]
Sun Sep 23 12:16:05 CEST 2012
Sun Sep 23 12:16:05 CEST 2012


Kind regards

Siegmar

Reply via email to