The 1.7 series has a completely different way of handling node topology than 
was used in the 1.6 series. It provides some enhanced features, but it does 
have some drawbacks in the case where the topology info isn't correct. I fear 
you are running into this problem (again).

All the commands you show here work fine for me on a Linux x86_64 box using 
1.7r27361 on a Westmere 6-core single-socket machine with hyperthreads enabled. 
I cannot replicate any of the reported problems, so there isn't much I can do 
at this point.

As I've said before, the root problem here appears to be some hwloc-related 
issue with your setup. Until that gets resolved so we get correct topology 
info, I'm not sure what can be done to resolve what you are seeing. I'll raise 
the question of possibly providing some alternative support for setups like 
yours that just can't get topology info, but that would definitely be a 
long-term question.


On Sep 23, 2012, at 3:20 AM, Siegmar Gross 
<siegmar.gr...@informatik.hs-fulda.de> wrote:

> Hi,
> 
> yesterday I installed openmpi-1.7a1r27358 and it has an improved
> error message compared to openmpi-1.6.2, but doesn't show process bindings
> and has some other problems as well.
> 
> 
> "sunpc0" and "linpc0" are equipped with two dual-core processors running
> Solaris 10 x86_64 and Linux x86_64 resp. "tyr" is a dual-processor machine
> running Solaris 10 Sparc.
> 
> tyr fd1026 105 mpiexec -np 2 -host sunpc0 -report-bindings \
>  -map-by core -bind-to-core date
> Sun Sep 23 11:46:36 CEST 2012
> Sun Sep 23 11:46:36 CEST 2012
> 
> tyr fd1026 106 mpicc -showme
> cc -I/usr/local/openmpi-1.7_64_cc/include -mt -m64 
>  -L/usr/local/openmpi-1.7_64_cc/lib64 -lmpi -lpicl -lm -lkstat -llgrp
>  -lsocket -lnsl -lrt -lm
> 
> 
> openmpi-1.6.2 shows process bindings.
> 
> tyr fd1026 103 mpiexec -np 2 -host sunpc0 -report-bindings \
>  -bycore -bind-to-core date
> Sun Sep 23 12:09:06 CEST 2012
> [sunpc0:13197] MCW rank 0 bound to socket 0[core 0]: [B .][. .]
> [sunpc0:13197] MCW rank 1 bound to socket 0[core 1]: [. B][. .]
> Sun Sep 23 12:09:06 CEST 2012
> 
> 
> tyr fd1026 104 mpicc -showme
> cc -I/usr/local/openmpi-1.6.2_64_cc/include -mt -m64
>  -L/usr/local/openmpi-1.6.2_64_cc/lib64 -lmpi -lm -lkstat -llgrp
>  -lsocket -lnsl -lrt -lm
> 
> 
> On my Linux machine I get a warning.
> 
> tyr fd1026 113 mpiexec -np 2 -host linpc0 -report-bindings \
>  -map-by core -bind-to-core date
> --------------------------------------------------------------------------
> WARNING: a request was made to bind a process. While the system
> supports binding the process itself, at least one node does NOT
> support binding memory to the process location.
> 
>  Node:  linpc0
> 
> This is a warning only; your job will continue, though performance may
> be degraded.
> --------------------------------------------------------------------------
> Sun Sep 23 11:56:04 CEST 2012
> Sun Sep 23 11:56:04 CEST 2012
> 
> 
> 
> Everything works fine with openmpi-1.6.2.
> 
> tyr fd1026 106 mpiexec -np 2 -host linpc0 -report-bindings \
>  -bycore -bind-to-core date
> [linpc0:15808] MCW rank 0 bound to socket 0[core 0]: [B .][. .]
> [linpc0:15808] MCW rank 1 bound to socket 0[core 1]: [. B][. .]
> Sun Sep 23 12:11:47 CEST 2012
> Sun Sep 23 12:11:47 CEST 2012
> 
> 
> 
> 
> Om my Solaris Sparc machine I get the following errors.
> 
> 
> tyr fd1026 121 mpiexec -np 2 -report-bindings -map-by core -bind-to-core date
> [tyr.informatik.hs-fulda.de:23773] [[32457,0],0] ORTE_ERROR_LOG: Value out of 
> bounds in file 
> ../../../../openmpi-1.7a1r27358/orte/mca/odls/base/odls_base_default_fns.c at 
> line 847
> [tyr.informatik.hs-fulda.de:23773] [[32457,0],0] ORTE_ERROR_LOG: Value out of 
> bounds in file 
> ../../../../openmpi-1.7a1r27358/orte/mca/odls/base/odls_base_default_fns.c at 
> line 1414
> [tyr.informatik.hs-fulda.de:23773] [[32457,0],0] ORTE_ERROR_LOG: Value out of 
> bounds in file 
> ../../../../openmpi-1.7a1r27358/orte/mca/odls/base/odls_base_default_fns.c at 
> line 847
> [tyr.informatik.hs-fulda.de:23773] [[32457,0],0] ORTE_ERROR_LOG: Value out of 
> bounds in file 
> ../../../../openmpi-1.7a1r27358/orte/mca/odls/base/odls_base_default_fns.c at 
> line 1414
> 
> 
> 
> tyr fd1026 122 mpiexec -np 2 -host tyr -report-bindings -map-by core -bind-to 
> core date
> --------------------------------------------------------------------------
> All nodes which are allocated for this job are already filled.
> --------------------------------------------------------------------------
> 
> 
> Once more everything works fine with openmpi-1.6.2.
> 
> tyr fd1026 109 mpiexec -np 2 -report-bindings -bycore -bind-to-core date
> [tyr.informatik.hs-fulda.de:23869] MCW rank 0 bound to socket 0[core 0]: 
> [B][.]
> [tyr.informatik.hs-fulda.de:23869] MCW rank 1 bound to socket 1[core 0]: 
> [.][B]
> Sun Sep 23 12:14:09 CEST 2012
> Sun Sep 23 12:14:09 CEST 2012
> 
> tyr fd1026 110 mpiexec -np 2 -host tyr -report-bindings -bycore -bind-to-core 
> date
> [tyr.informatik.hs-fulda.de:23877] MCW rank 0 bound to socket 0[core 0]: 
> [B][.]
> [tyr.informatik.hs-fulda.de:23877] MCW rank 1 bound to socket 1[core 0]: 
> [.][B]
> Sun Sep 23 12:16:05 CEST 2012
> Sun Sep 23 12:16:05 CEST 2012
> 
> 
> Kind regards
> 
> Siegmar
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to