Siegmar --

This looks like the typical type of alignment error that we used to see when 
testing regularly on SPARC.  :-\

It looks like the error was happening in mca_db_hash.so.  Could you get a stack 
trace / file+line number where it was failing in mca_db_hash?  (i.e., the 
actual bad code will likely be under opal/mca/db/hash somewhere)


On Jul 25, 2014, at 2:08 AM, Siegmar Gross 
<siegmar.gr...@informatik.hs-fulda.de> wrote:

> Hi,
> 
> I have installed openmpi-1.8.2rc2 with gcc-4.9.0 on Solaris
> 10 Sparc and I receive a bus error, if I run a small program.
> 
> tyr hello_1 105 mpiexec -np 2 a.out 
> [tyr:29164] *** Process received signal ***
> [tyr:29164] Signal: Bus Error (10)
> [tyr:29164] Signal code: Invalid address alignment (1)
> [tyr:29164] Failing at address: ffffffff7fffd1c4
> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_backtrace_print+0x2c
> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:0xccfd0
> /lib/sparcv9/libc.so.1:0xd8b98
> /lib/sparcv9/libc.so.1:0xcc70c
> /lib/sparcv9/libc.so.1:0xcc918
> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_db_hash.so:0x3ee8
>  [ Signal 10 (BUS)]
> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_db_base_store+0xc8
> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_util_decode_pidmap+0x798
> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_util_nidmap_init+0x3cc
> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_ess_env.so:0x226c
> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_init+0x308
> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:ompi_mpi_init+0x31c
> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:PMPI_Init+0x2a8
> /home/fd1026/work/skripte/master/parallel/prog/mpi/hello_1/a.out:main+0x20
> /home/fd1026/work/skripte/master/parallel/prog/mpi/hello_1/a.out:_start+0x7c
> [tyr:29164] *** End of error message ***
> ...
> 
> 
> I get the following output if I run the program in "dbx".
> 
> ...
> RTC: Enabling Error Checking...
> RTC: Running program...
> Write to unallocated (wua) on thread 1:
> Attempting to write 1 byte at address 0xffffffff79f04000
> t@1 (l@1) stopped in _readdir at 0xffffffff55174da0
> 0xffffffff55174da0: _readdir+0x0064:    call     
> _PROCEDURE_LINKAGE_TABLE_+0x2380 [PLT] ! 0xffffffff55342a80
> (dbx) 
> 
> 
> Hopefully the above output helps to fix the error. Can I provide
> anything else? Thank you very much for any help in advance.
> 
> 
> Kind regards
> 
> Siegmar
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/07/24869.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to