I saw your earlier note about this too. Just a little busy right now, but hope to look at it soon.
Your rankfile looks fine, so undoubtedly a bug has crept into this rarely-used code path. On Oct 3, 2012, at 3:03 AM, Siegmar Gross <siegmar.gr...@informatik.hs-fulda.de> wrote: > Hi, > > I want to test process bindings with a rankfile in openmpi-1.6.2. Both > machines are dual-processor dual-core machines running Solaris 10 x86_64. > > tyr fd1026 138 cat host_sunpc0_1 > sunpc0 slots=4 > sunpc1 slots=4 > > tyr fd1026 139 cat rankfile > rank 0=sunpc0 slot=0:0-1,1:0-1 > rank 1=sunpc1 slot=0:0-1 > rank 2=sunpc1 slot=1:0 > rank 3=sunpc1 slot=1:1 > > tyr fd1026 140 mpiexec -rf rankfile hostname > -------------------------------------------------------------------------- > All nodes which are allocated for this job are already filled. > -------------------------------------------------------------------------- > > Is something wrong with my rankfile, must I add a hostfile, or is it a > bug? I get the following error when I add a hostfile. > > > tyr fd1026 141 mpiexec -hostfile host_sunpc0_1 -rf rankfile hostname > [tyr.informatik.hs-fulda.de:20227] [[27927,0],0] ORTE_ERROR_LOG: > Data unpack would read past end of buffer in file > ../../../../openmpi-1.6.2/orte/mca/odls/base/odls_base_default_fns.c > at line 927 > ^Cmpiexec: abort is already in progress...hit ctrl-c again to forcibly > terminate > > > I get the following outputs when I use Linux instead of Solaris > (same hardware). > > tyr fd1026 146 mpiexec -rf rankfile_linux hostname > -------------------------------------------------------------------------- > All nodes which are allocated for this job are already filled. > -------------------------------------------------------------------------- > > tyr fd1026 147 mpiexec -hostfile host_linpc0_1 -rf rankfile_linux hostname > [tyr.informatik.hs-fulda.de:20260] [[27952,0],0] ORTE_ERROR_LOG: Data unpack > would read past end of buffer in > file ../../../../openmpi-1.6.2/orte/mca/odls/base/odls_base_default_fns.c at > line 927 > [tyr:20260] *** Process received signal *** > [tyr:20260] Signal: Bus Error (10) > [tyr:20260] Signal code: Invalid address alignment (1) > [tyr:20260] Failing at address: 7463703a2f2f3129 > /export2/prog/SunOS_sparc/openmpi-1.6.2_64_cc/lib64/libopen-rte.so.4.0.0:opal_backtrace_print+0x14 > /export2/prog/SunOS_sparc/openmpi-1.6.2_64_cc/lib64/libopen-rte.so.4.0.0:0x335b48 > /lib/sparcv9/libc.so.1:0xd88a4 > /lib/sparcv9/libc.so.1:0xcc418 > /lib/sparcv9/libc.so.1:0xcc624 > /lib/sparcv9/libc.so.1:0x64394 [ Signal 2131043744 (?)] > /lib/sparcv9/libc.so.1:free+0x30 > /export2/prog/SunOS_sparc/openmpi-1.6.2_64_cc/lib64/libopen-rte.so.4.0.0:orte_odls_base_default_construct_child > _list+0x20b8 > /export2/prog/SunOS_sparc/openmpi-1.6.2_64_cc/lib64/openmpi/mca_odls_default.so:0x11c80 > ... > > "tyr" is a Sparc machine running Solaris 10. I get a similar error if > I run the command on a Linux machine. > > tyr fd1026 148 ssh linpc4 > linpc4 fd1026 100 mpiexec -rf rankfile_linux hostname > -------------------------------------------------------------------------- > All nodes which are allocated for this job are already filled. > -------------------------------------------------------------------------- > > linpc4 fd1026 101 mpiexec -hostfile host_linpc0_1 -rf rankfile_linux hostname > [linpc4:08079] [[49559,0],0] ORTE_ERROR_LOG: Data unpack would read past end > of buffer in file > ../../../../openmpi-1.6.2/orte/mca/odls/base/odls_base_default_fns.c at line > 927 > [linpc4:08079] *** Process received signal *** > [linpc4:08079] Signal: Segmentation fault (11) > [linpc4:08079] Signal code: Address not mapped (1) > [linpc4:08079] Failing at address: 0x900306368 > [linpc4:08079] [ 0] /lib64/libpthread.so.0(+0xfd00) [0x7fbe174bcd00] > [linpc4:08079] [ 1] /lib64/libc.so.6(cfree+0x14) [0x7fbe17197d24] > [linpc4:08079] [ 2] > /usr/local/openmpi-1.6.2_64_cc/lib64/libopen-rte.so.4(orte_odls_base_default_construct_child_list+0x2091) > > [0x7fbe182e4d21] > [linpc4:08079] [ 3] > /usr/local/openmpi-1.6.2_64_cc/lib64/openmpi/mca_odls_default.so(+0x10dba) > [0x7fbe15415dba] > ... > > Thank you very much for any suggestion in advance. > > > Kind regards > > Siegmar > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users