I saw your earlier note about this too. Just a little busy right now, but hope 
to look at it soon.

Your rankfile looks fine, so undoubtedly a bug has crept into this rarely-used 
code path.


On Oct 3, 2012, at 3:03 AM, Siegmar Gross 
<siegmar.gr...@informatik.hs-fulda.de> wrote:

> Hi,
> 
> I want to test process bindings with a rankfile in openmpi-1.6.2. Both
> machines are dual-processor dual-core machines running Solaris 10 x86_64.
> 
> tyr fd1026 138 cat host_sunpc0_1 
> sunpc0 slots=4
> sunpc1 slots=4
> 
> tyr fd1026 139 cat rankfile 
> rank 0=sunpc0 slot=0:0-1,1:0-1
> rank 1=sunpc1 slot=0:0-1
> rank 2=sunpc1 slot=1:0
> rank 3=sunpc1 slot=1:1
> 
> tyr fd1026 140 mpiexec -rf rankfile hostname
> --------------------------------------------------------------------------
> All nodes which are allocated for this job are already filled.
> --------------------------------------------------------------------------
> 
> Is something wrong with my rankfile, must I add a hostfile, or is it a
> bug? I get the following error when I add a hostfile. 
> 
> 
> tyr fd1026 141 mpiexec -hostfile host_sunpc0_1 -rf rankfile hostname
> [tyr.informatik.hs-fulda.de:20227] [[27927,0],0] ORTE_ERROR_LOG:
>  Data unpack would read past end of buffer in file
>  ../../../../openmpi-1.6.2/orte/mca/odls/base/odls_base_default_fns.c
>  at line 927
> ^Cmpiexec: abort is already in progress...hit ctrl-c again to forcibly
>  terminate
> 
> 
> I get the following outputs when I use Linux instead of Solaris
> (same hardware).
> 
> tyr fd1026 146 mpiexec -rf rankfile_linux hostname
> --------------------------------------------------------------------------
> All nodes which are allocated for this job are already filled.
> --------------------------------------------------------------------------
> 
> tyr fd1026 147 mpiexec -hostfile host_linpc0_1 -rf rankfile_linux hostname
> [tyr.informatik.hs-fulda.de:20260] [[27952,0],0] ORTE_ERROR_LOG: Data unpack 
> would read past end of buffer in 
> file ../../../../openmpi-1.6.2/orte/mca/odls/base/odls_base_default_fns.c at 
> line 927
> [tyr:20260] *** Process received signal ***
> [tyr:20260] Signal: Bus Error (10)
> [tyr:20260] Signal code: Invalid address alignment (1)
> [tyr:20260] Failing at address: 7463703a2f2f3129
> /export2/prog/SunOS_sparc/openmpi-1.6.2_64_cc/lib64/libopen-rte.so.4.0.0:opal_backtrace_print+0x14
> /export2/prog/SunOS_sparc/openmpi-1.6.2_64_cc/lib64/libopen-rte.so.4.0.0:0x335b48
> /lib/sparcv9/libc.so.1:0xd88a4
> /lib/sparcv9/libc.so.1:0xcc418
> /lib/sparcv9/libc.so.1:0xcc624
> /lib/sparcv9/libc.so.1:0x64394 [ Signal 2131043744 (?)]
> /lib/sparcv9/libc.so.1:free+0x30
> /export2/prog/SunOS_sparc/openmpi-1.6.2_64_cc/lib64/libopen-rte.so.4.0.0:orte_odls_base_default_construct_child
> _list+0x20b8
> /export2/prog/SunOS_sparc/openmpi-1.6.2_64_cc/lib64/openmpi/mca_odls_default.so:0x11c80
> ...
> 
> "tyr" is a Sparc machine running Solaris 10. I get a similar error if
> I run the command on a Linux machine.
> 
> tyr fd1026 148 ssh linpc4
> linpc4 fd1026 100  mpiexec -rf rankfile_linux hostname
> --------------------------------------------------------------------------
> All nodes which are allocated for this job are already filled.
> --------------------------------------------------------------------------
> 
> linpc4 fd1026 101 mpiexec -hostfile host_linpc0_1 -rf rankfile_linux hostname
> [linpc4:08079] [[49559,0],0] ORTE_ERROR_LOG: Data unpack would read past end 
> of buffer in file 
> ../../../../openmpi-1.6.2/orte/mca/odls/base/odls_base_default_fns.c at line 
> 927
> [linpc4:08079] *** Process received signal ***
> [linpc4:08079] Signal: Segmentation fault (11)
> [linpc4:08079] Signal code: Address not mapped (1)
> [linpc4:08079] Failing at address: 0x900306368
> [linpc4:08079] [ 0] /lib64/libpthread.so.0(+0xfd00) [0x7fbe174bcd00]
> [linpc4:08079] [ 1] /lib64/libc.so.6(cfree+0x14) [0x7fbe17197d24]
> [linpc4:08079] [ 2] 
> /usr/local/openmpi-1.6.2_64_cc/lib64/libopen-rte.so.4(orte_odls_base_default_construct_child_list+0x2091)
>  
> [0x7fbe182e4d21]
> [linpc4:08079] [ 3] 
> /usr/local/openmpi-1.6.2_64_cc/lib64/openmpi/mca_odls_default.so(+0x10dba) 
> [0x7fbe15415dba]
> ...
> 
> Thank you very much for any suggestion in advance.
> 
> 
> Kind regards
> 
> Siegmar
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to