Hi, I'm sorry to answer so late, but last week I didn't have Internet access. In the meantime I've installed openmpi-1.8.2rc3 and I get the same error.
> This looks like the typical type of alignment error that we used > to see when testing regularly on SPARC. :-\ > > It looks like the error was happening in mca_db_hash.so. Could > you get a stack trace / file+line number where it was failing > in mca_db_hash? (i.e., the actual bad code will likely be under > opal/mca/db/hash somewhere) Unfortunately I don't get a file+line number from a file in opal/mca/db/Hash. tyr small_prog 102 ompi_info | grep MPI: Open MPI: 1.8.2rc3 tyr small_prog 103 which mpicc /usr/local/openmpi-1.8.2_64_gcc/bin/mpicc tyr small_prog 104 mpicc init_finalize.c tyr small_prog 106 /opt/solstudio12.3/bin/sparcv9/dbx /usr/local/openmpi-1.8.2_64_gcc/bin/mpiexec For information about new features see `help changes' To remove this message, put `dbxenv suppress_startup_message 7.9' in your .dbxrc Reading mpiexec Reading ld.so.1 Reading libopen-rte.so.7.0.4 Reading libopen-pal.so.6.2.0 Reading libsendfile.so.1 Reading libpicl.so.1 Reading libkstat.so.1 Reading liblgrp.so.1 Reading libsocket.so.1 Reading libnsl.so.1 Reading libgcc_s.so.1 Reading librt.so.1 Reading libm.so.2 Reading libpthread.so.1 Reading libc.so.1 Reading libdoor.so.1 Reading libaio.so.1 Reading libmd.so.1 (dbx) check -all access checking - ON memuse checking - ON (dbx) run -np 1 a.outRunning: mpiexec -np 1 a.out (process id 27833) Reading rtcapihook.so Reading libdl.so.1 Reading rtcaudit.so Reading libmapmalloc.so.1 Reading libgen.so.1 Reading libc_psr.so.1 Reading rtcboot.so Reading librtc.so Reading libmd_psr.so.1 RTC: Enabling Error Checking... RTC: Running program... Write to unallocated (wua) on thread 1: Attempting to write 1 byte at address 0xffffffff79f04000 t@1 (l@1) stopped in _readdir at 0xffffffff55174da0 0xffffffff55174da0: _readdir+0x0064: call _PROCEDURE_LINKAGE_TABLE_+0x2380 [PLT] ! 0xffffffff55342a80 (dbx) where current thread: t@1 =>[1] _readdir(0xffffffff79f00300, 0x2e6800, 0x4, 0x2d, 0x4, 0xffffffff79f00300), at 0xffffffff55174da0 [2] list_files_by_dir(0x100138fd8, 0xffffffff7fffd1f0, 0xffffffff7fffd1e8, 0xffffffff7fffd210, 0x0, 0xffffffff702a0010), at 0xffffffff63174594 [3] foreachfile_callback(0x100138fd8, 0xffffffff7fffd458, 0x0, 0x2e, 0x0, 0xffffffff702a0010), at 0xffffffff6317461c [4] foreach_dirinpath(0x1001d8a28, 0x0, 0xffffffff631745e0, 0xffffffff7fffd458, 0x0, 0xffffffff702a0010), at 0xffffffff63171684 [5] lt_dlforeachfile(0x1001d8a28, 0xffffffff6319656c, 0x0, 0x53, 0x2f, 0xf), at 0xffffffff63174748 [6] find_dyn_components(0x0, 0xffffffff6323b570, 0x0, 0x1, 0xffffffff7fffd6a0, 0xffffffff702a0010), at 0xffffffff63195e38 [7] mca_base_component_find(0x0, 0xffffffff6323b570, 0xffffffff6335e1b0, 0x0, 0xffffffff7fffd6a0, 0x1), at 0xffffffff631954d8 [8] mca_base_framework_components_register(0xffffffff6335e1c0, 0x0, 0x3e, 0x0, 0x3b, 0x100800), at 0xffffffff631b1638 [9] mca_base_framework_register(0xffffffff6335e1c0, 0x0, 0x2, 0xffffffff7fffd8d0, 0x0, 0xffffffff702a0010), at 0xffffffff631b24d4 [10] mca_base_framework_open(0xffffffff6335e1c0, 0x0, 0x2, 0xffffffff7fffd990, 0x0, 0xffffffff702a0010), at 0xffffffff631b25d0 [11] opal_init(0xffffffff7fffdd70, 0xffffffff7fffdd78, 0x100117c60, 0xffffffff7fffde58, 0x400, 0x100117c60), at 0xffffffff63153694 [12] orterun(0x4, 0xffffffff7fffde58, 0x2, 0xffffffff7fffdda0, 0x0, 0xffffffff702a0010), at 0x100005078 [13] main(0x4, 0xffffffff7fffde58, 0xffffffff7fffde80, 0x100117c60, 0x100000000, 0xffffffff6a700200), at 0x100003d68 (dbx) I get the following output with gdb. tyr small_prog 107 /usr/local/gdb-7.6.1_64_gcc/bin/gdb /usr/local/openmpi-1.8.2_64_gcc/bin/mpiexec GNU gdb (GDB) 7.6.1 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "sparc-sun-solaris2.10". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/bin/orterun...done. (gdb) run -np 1 a.out Starting program: /usr/local/openmpi-1.8.2_64_gcc/bin/mpiexec -np 1 a.out [Thread debugging using libthread_db enabled] [New Thread 1 (LWP 1)] [New LWP 2 ] [tyr:27867] *** Process received signal *** [tyr:27867] Signal: Bus Error (10) [tyr:27867] Signal code: Invalid address alignment (1) [tyr:27867] Failing at address: ffffffff7fffd224 /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_backtrace_print+0x2c /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:0xccfa0 /lib/sparcv9/libc.so.1:0xd8b98 /lib/sparcv9/libc.so.1:0xcc70c /lib/sparcv9/libc.so.1:0xcc918 /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_db_hash.so:0x3ee8 [ Signal 10 (BUS)] /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_db_base_store+0xc8 /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_util_decode_pidmap+0x798 /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_util_nidmap_init+0x3cc /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_ess_env.so:0x226c /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_init+0x308 /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:ompi_mpi_init+0x31c /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:PMPI_Init+0x2a8 /home/fd1026/work/skripte/master/parallel/prog/mpi/small_prog/a.out:main+0x20 /home/fd1026/work/skripte/master/parallel/prog/mpi/small_prog/a.out:_start+0x7c [tyr:27867] *** End of error message *** -------------------------------------------------------------------------- mpiexec noticed that process rank 0 with PID 27867 on node tyr exited on signal 10 (Bus Error). -------------------------------------------------------------------------- [LWP 2 exited] [New Thread 2 ] [Switching to Thread 1 (LWP 1)] sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to satisfy query (gdb) bt #0 0xffffffff7f6173d0 in rtld_db_dlactivity () from /usr/lib/sparcv9/ld.so.1 #1 0xffffffff7f6175a8 in rd_event () from /usr/lib/sparcv9/ld.so.1 #2 0xffffffff7f618950 in lm_delete () from /usr/lib/sparcv9/ld.so.1 #3 0xffffffff7f6226bc in remove_so () from /usr/lib/sparcv9/ld.so.1 #4 0xffffffff7f624574 in remove_hdl () from /usr/lib/sparcv9/ld.so.1 #5 0xffffffff7f61d97c in dlclose_core () from /usr/lib/sparcv9/ld.so.1 #6 0xffffffff7f61d9d4 in dlclose_intn () from /usr/lib/sparcv9/ld.so.1 #7 0xffffffff7f61db0c in dlclose () from /usr/lib/sparcv9/ld.so.1 #8 0xffffffff7ec7746c in vm_close () from /usr/local/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6 #9 0xffffffff7ec74a4c in lt_dlclose () from /usr/local/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6 #10 0xffffffff7ec99b70 in ri_destructor (obj=0x1001ead30) at ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_component_repository.c:391 #11 0xffffffff7ec98488 in opal_obj_run_destructors (object=0x1001ead30) at ../../../../openmpi-1.8.2rc3/opal/class/opal_object.h:446 #12 0xffffffff7ec993ec in mca_base_component_repository_release ( component=0xffffffff7b023cf0 <mca_oob_tcp_component>) at ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_component_repository.c:244 #13 0xffffffff7ec9b734 in mca_base_component_unload ( component=0xffffffff7b023cf0 <mca_oob_tcp_component>, output_id=-1) at ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_components_close.c:47 #14 0xffffffff7ec9b7c8 in mca_base_component_close ( component=0xffffffff7b023cf0 <mca_oob_tcp_component>, output_id=-1) at ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_components_close.c:60 #15 0xffffffff7ec9b89c in mca_base_components_close (output_id=-1, components=0xffffffff7f12b430 <orte_oob_base_framework+80>, skip=0x0) ---Type <return> to continue, or q <return> to quit--- at ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_components_close.c:86 #16 0xffffffff7ec9b804 in mca_base_framework_components_close ( framework=0xffffffff7f12b3e0 <orte_oob_base_framework>, skip=0x0) at ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_components_close.c:66 #17 0xffffffff7efae1e4 in orte_oob_base_close () at ../../../../openmpi-1.8.2rc3/orte/mca/oob/base/oob_base_frame.c:94 #18 0xffffffff7ecb28ac in mca_base_framework_close ( framework=0xffffffff7f12b3e0 <orte_oob_base_framework>) at ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_framework.c:187 #19 0xffffffff7bf078c0 in rte_finalize () at ../../../../../openmpi-1.8.2rc3/orte/mca/ess/hnp/ess_hnp_module.c:858 #20 0xffffffff7ef30a44 in orte_finalize () at ../../openmpi-1.8.2rc3/orte/runtime/orte_finalize.c:65 #21 0x00000001000070c4 in orterun (argc=4, argv=0xffffffff7fffe0e8) at ../../../../openmpi-1.8.2rc3/orte/tools/orterun/orterun.c:1096 #22 0x0000000100003d70 in main (argc=4, argv=0xffffffff7fffe0e8) at ../../../../openmpi-1.8.2rc3/orte/tools/orterun/main.c:13 (gdb) Is the above information helpful to track down the error? Do you need anything else? Thank you very much for any help in advance. Kind regards Siegmar > On Jul 25, 2014, at 2:08 AM, Siegmar Gross > <siegmar.gr...@informatik.hs-fulda.de> wrote: > > > Hi, > > > > I have installed openmpi-1.8.2rc2 with gcc-4.9.0 on Solaris > > 10 Sparc and I receive a bus error, if I run a small program. > > > > tyr hello_1 105 mpiexec -np 2 a.out > > [tyr:29164] *** Process received signal *** > > [tyr:29164] Signal: Bus Error (10) > > [tyr:29164] Signal code: Invalid address alignment (1) > > [tyr:29164] Failing at address: ffffffff7fffd1c4 > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_backtrace_print+0x2c > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:0xccfd0 > > /lib/sparcv9/libc.so.1:0xd8b98 > > /lib/sparcv9/libc.so.1:0xcc70c > > /lib/sparcv9/libc.so.1:0xcc918 > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_db_hash.so:0x3ee8 > > [ Signal 10 (BUS)] > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_db_base_store+0xc8 > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_util_decode_pidmap+0x798 > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_util_nidmap_init+0x3cc > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_ess_env.so:0x226c > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_init+0x308 > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:ompi_mpi_init+0x31c > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:PMPI_Init+0x2a8 > > /home/fd1026/work/skripte/master/parallel/prog/mpi/hello_1/a.out:main+0x20 > > /home/fd1026/work/skripte/master/parallel/prog/mpi/hello_1/a.out:_start+0x7c > > [tyr:29164] *** End of error message *** > > ... > > > > > > I get the following output if I run the program in "dbx". > > > > ... > > RTC: Enabling Error Checking... > > RTC: Running program... > > Write to unallocated (wua) on thread 1: > > Attempting to write 1 byte at address 0xffffffff79f04000 > > t@1 (l@1) stopped in _readdir at 0xffffffff55174da0 > > 0xffffffff55174da0: _readdir+0x0064: call > > _PROCEDURE_LINKAGE_TABLE_+0x2380 [PLT] ! 0xffffffff55342a80 > > (dbx) > > > > > > Hopefully the above output helps to fix the error. Can I provide > > anything else? Thank you very much for any help in advance. > > > > > > Kind regards > > > > Siegmar > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > > http://www.open-mpi.org/community/lists/users/2014/07/24869.php > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > >