Did you configure this --enable-debug? If so, you should get a line number in 
the backtrace


On Aug 5, 2014, at 2:59 AM, Siegmar Gross 
<siegmar.gr...@informatik.hs-fulda.de> wrote:

> Hi,
> 
> I'm sorry to answer so late, but last week I didn't have Internet
> access. In the meantime I've installed openmpi-1.8.2rc3 and I get
> the same error.
> 
>> This looks like the typical type of alignment error that we used
>> to see when testing regularly on SPARC.  :-\
>> 
>> It looks like the error was happening in mca_db_hash.so.  Could
>> you get a stack trace / file+line number where it was failing
>> in mca_db_hash?  (i.e., the actual bad code will likely be under
>> opal/mca/db/hash somewhere)
> 
> Unfortunately I don't get a file+line number from a file in
> opal/mca/db/Hash.
> 
> 
> 
> tyr small_prog 102 ompi_info | grep MPI:
>                Open MPI: 1.8.2rc3
> tyr small_prog 103 which mpicc
> /usr/local/openmpi-1.8.2_64_gcc/bin/mpicc
> tyr small_prog 104 mpicc init_finalize.c 
> tyr small_prog 106 /opt/solstudio12.3/bin/sparcv9/dbx 
> /usr/local/openmpi-1.8.2_64_gcc/bin/mpiexec 
> For information about new features see `help changes'
> To remove this message, put `dbxenv suppress_startup_message 7.9' in your 
> .dbxrc
> Reading mpiexec
> Reading ld.so.1
> Reading libopen-rte.so.7.0.4
> Reading libopen-pal.so.6.2.0
> Reading libsendfile.so.1
> Reading libpicl.so.1
> Reading libkstat.so.1
> Reading liblgrp.so.1
> Reading libsocket.so.1
> Reading libnsl.so.1
> Reading libgcc_s.so.1
> Reading librt.so.1
> Reading libm.so.2
> Reading libpthread.so.1
> Reading libc.so.1
> Reading libdoor.so.1
> Reading libaio.so.1
> Reading libmd.so.1
> (dbx) check -all
> access checking - ON
> memuse checking - ON
> (dbx) run -np 1 a.outRunning: mpiexec -np 1 a.out 
> (process id 27833)
> Reading rtcapihook.so
> Reading libdl.so.1
> Reading rtcaudit.so
> Reading libmapmalloc.so.1
> Reading libgen.so.1
> Reading libc_psr.so.1
> Reading rtcboot.so
> Reading librtc.so
> Reading libmd_psr.so.1
> RTC: Enabling Error Checking...
> RTC: Running program...
> Write to unallocated (wua) on thread 1:
> Attempting to write 1 byte at address 0xffffffff79f04000
> t@1 (l@1) stopped in _readdir at 0xffffffff55174da0
> 0xffffffff55174da0: _readdir+0x0064:    call     
> _PROCEDURE_LINKAGE_TABLE_+0x2380 [PLT] ! 0xffffffff55342a80
> (dbx) where
> current thread: t@1
> =>[1] _readdir(0xffffffff79f00300, 0x2e6800, 0x4, 0x2d, 0x4, 
> 0xffffffff79f00300), at 0xffffffff55174da0 
>  [2] list_files_by_dir(0x100138fd8, 0xffffffff7fffd1f0, 0xffffffff7fffd1e8, 
> 0xffffffff7fffd210, 0x0, 0xffffffff702a0010), at 
> 0xffffffff63174594 
>  [3] foreachfile_callback(0x100138fd8, 0xffffffff7fffd458, 0x0, 0x2e, 0x0, 
> 0xffffffff702a0010), at 0xffffffff6317461c 
>  [4] foreach_dirinpath(0x1001d8a28, 0x0, 0xffffffff631745e0, 
> 0xffffffff7fffd458, 0x0, 0xffffffff702a0010), at 0xffffffff63171684 
>  [5] lt_dlforeachfile(0x1001d8a28, 0xffffffff6319656c, 0x0, 0x53, 0x2f, 0xf), 
> at 0xffffffff63174748 
>  [6] find_dyn_components(0x0, 0xffffffff6323b570, 0x0, 0x1, 
> 0xffffffff7fffd6a0, 0xffffffff702a0010), at 0xffffffff63195e38 
>  [7] mca_base_component_find(0x0, 0xffffffff6323b570, 0xffffffff6335e1b0, 
> 0x0, 0xffffffff7fffd6a0, 0x1), at 0xffffffff631954d8 
>  [8] mca_base_framework_components_register(0xffffffff6335e1c0, 0x0, 0x3e, 
> 0x0, 0x3b, 0x100800), at 0xffffffff631b1638 
>  [9] mca_base_framework_register(0xffffffff6335e1c0, 0x0, 0x2, 
> 0xffffffff7fffd8d0, 0x0, 0xffffffff702a0010), at 0xffffffff631b24d4 
>  [10] mca_base_framework_open(0xffffffff6335e1c0, 0x0, 0x2, 
> 0xffffffff7fffd990, 0x0, 0xffffffff702a0010), at 0xffffffff631b25d0 
>  [11] opal_init(0xffffffff7fffdd70, 0xffffffff7fffdd78, 0x100117c60, 
> 0xffffffff7fffde58, 0x400, 0x100117c60), at 
> 0xffffffff63153694 
>  [12] orterun(0x4, 0xffffffff7fffde58, 0x2, 0xffffffff7fffdda0, 0x0, 
> 0xffffffff702a0010), at 0x100005078 
>  [13] main(0x4, 0xffffffff7fffde58, 0xffffffff7fffde80, 0x100117c60, 
> 0x100000000, 0xffffffff6a700200), at 0x100003d68 
> (dbx) 
> 
> 
> 
> I get the following output with gdb.
> 
> tyr small_prog 107 /usr/local/gdb-7.6.1_64_gcc/bin/gdb 
> /usr/local/openmpi-1.8.2_64_gcc/bin/mpiexec 
> GNU gdb (GDB) 7.6.1
> Copyright (C) 2013 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "sparc-sun-solaris2.10".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from 
> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/bin/orterun...done.
> (gdb) run -np 1 a.out
> Starting program: /usr/local/openmpi-1.8.2_64_gcc/bin/mpiexec -np 1 a.out
> [Thread debugging using libthread_db enabled]
> [New Thread 1 (LWP 1)]
> [New LWP    2        ]
> [tyr:27867] *** Process received signal ***
> [tyr:27867] Signal: Bus Error (10)
> [tyr:27867] Signal code: Invalid address alignment (1)
> [tyr:27867] Failing at address: ffffffff7fffd224
> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_backtrace_print+0x2c
> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:0xccfa0
> /lib/sparcv9/libc.so.1:0xd8b98
> /lib/sparcv9/libc.so.1:0xcc70c
> /lib/sparcv9/libc.so.1:0xcc918
> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_db_hash.so:0x3ee8
>  [ Signal 10 (BUS)]
> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_db_base_store+0xc8
> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_util_decode_pidmap+0x798
> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_util_nidmap_init+0x3cc
> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_ess_env.so:0x226c
> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_init+0x308
> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:ompi_mpi_init+0x31c
> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:PMPI_Init+0x2a8
> /home/fd1026/work/skripte/master/parallel/prog/mpi/small_prog/a.out:main+0x20
> /home/fd1026/work/skripte/master/parallel/prog/mpi/small_prog/a.out:_start+0x7c
> [tyr:27867] *** End of error message ***
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 0 with PID 27867 on node tyr exited on 
> signal 10 (Bus Error).
> --------------------------------------------------------------------------
> [LWP    2         exited]
> [New Thread 2        ]
> [Switching to Thread 1 (LWP 1)]
> sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to 
> satisfy query
> (gdb) bt
> #0  0xffffffff7f6173d0 in rtld_db_dlactivity () from /usr/lib/sparcv9/ld.so.1
> #1  0xffffffff7f6175a8 in rd_event () from /usr/lib/sparcv9/ld.so.1
> #2  0xffffffff7f618950 in lm_delete () from /usr/lib/sparcv9/ld.so.1
> #3  0xffffffff7f6226bc in remove_so () from /usr/lib/sparcv9/ld.so.1
> #4  0xffffffff7f624574 in remove_hdl () from /usr/lib/sparcv9/ld.so.1
> #5  0xffffffff7f61d97c in dlclose_core () from /usr/lib/sparcv9/ld.so.1
> #6  0xffffffff7f61d9d4 in dlclose_intn () from /usr/lib/sparcv9/ld.so.1
> #7  0xffffffff7f61db0c in dlclose () from /usr/lib/sparcv9/ld.so.1
> #8  0xffffffff7ec7746c in vm_close ()
>   from /usr/local/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6
> #9  0xffffffff7ec74a4c in lt_dlclose ()
>   from /usr/local/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6
> #10 0xffffffff7ec99b70 in ri_destructor (obj=0x1001ead30)
>    at 
> ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_component_repository.c:391
> #11 0xffffffff7ec98488 in opal_obj_run_destructors (object=0x1001ead30)
>    at ../../../../openmpi-1.8.2rc3/opal/class/opal_object.h:446
> #12 0xffffffff7ec993ec in mca_base_component_repository_release (
>    component=0xffffffff7b023cf0 <mca_oob_tcp_component>)
>    at 
> ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_component_repository.c:244
> #13 0xffffffff7ec9b734 in mca_base_component_unload (
>    component=0xffffffff7b023cf0 <mca_oob_tcp_component>, output_id=-1)
>    at 
> ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_components_close.c:47
> #14 0xffffffff7ec9b7c8 in mca_base_component_close (
>    component=0xffffffff7b023cf0 <mca_oob_tcp_component>, output_id=-1)
>    at 
> ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_components_close.c:60
> #15 0xffffffff7ec9b89c in mca_base_components_close (output_id=-1, 
>    components=0xffffffff7f12b430 <orte_oob_base_framework+80>, skip=0x0)
> ---Type <return> to continue, or q <return> to quit---
>    at 
> ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_components_close.c:86
> #16 0xffffffff7ec9b804 in mca_base_framework_components_close (
>    framework=0xffffffff7f12b3e0 <orte_oob_base_framework>, skip=0x0)
>    at 
> ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_components_close.c:66
> #17 0xffffffff7efae1e4 in orte_oob_base_close ()
>    at ../../../../openmpi-1.8.2rc3/orte/mca/oob/base/oob_base_frame.c:94
> #18 0xffffffff7ecb28ac in mca_base_framework_close (
>    framework=0xffffffff7f12b3e0 <orte_oob_base_framework>)
>    at ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_framework.c:187
> #19 0xffffffff7bf078c0 in rte_finalize ()
>    at ../../../../../openmpi-1.8.2rc3/orte/mca/ess/hnp/ess_hnp_module.c:858
> #20 0xffffffff7ef30a44 in orte_finalize ()
>    at ../../openmpi-1.8.2rc3/orte/runtime/orte_finalize.c:65
> #21 0x00000001000070c4 in orterun (argc=4, argv=0xffffffff7fffe0e8)
>    at ../../../../openmpi-1.8.2rc3/orte/tools/orterun/orterun.c:1096
> #22 0x0000000100003d70 in main (argc=4, argv=0xffffffff7fffe0e8)
>    at ../../../../openmpi-1.8.2rc3/orte/tools/orterun/main.c:13
> (gdb) 
> 
> 
> Is the above information helpful to track down the error? Do you need
> anything else? Thank you very much for any help in advance.
> 
> 
> Kind regards
> 
> Siegmar
> 
> 
> 
> 
>> On Jul 25, 2014, at 2:08 AM, Siegmar Gross 
>> <siegmar.gr...@informatik.hs-fulda.de> wrote:
>> 
>>> Hi,
>>> 
>>> I have installed openmpi-1.8.2rc2 with gcc-4.9.0 on Solaris
>>> 10 Sparc and I receive a bus error, if I run a small program.
>>> 
>>> tyr hello_1 105 mpiexec -np 2 a.out 
>>> [tyr:29164] *** Process received signal ***
>>> [tyr:29164] Signal: Bus Error (10)
>>> [tyr:29164] Signal code: Invalid address alignment (1)
>>> [tyr:29164] Failing at address: ffffffff7fffd1c4
>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_backtrace_print+0x2c
>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:0xccfd0
>>> /lib/sparcv9/libc.so.1:0xd8b98
>>> /lib/sparcv9/libc.so.1:0xcc70c
>>> /lib/sparcv9/libc.so.1:0xcc918
>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_db_hash.so:0x3ee8
>>>  [ Signal 10 (BUS)]
>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_db_base_store+0xc8
>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_util_decode_pidmap+0x798
>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_util_nidmap_init+0x3cc
>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_ess_env.so:0x226c
>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_init+0x308
>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:ompi_mpi_init+0x31c
>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:PMPI_Init+0x2a8
>>> /home/fd1026/work/skripte/master/parallel/prog/mpi/hello_1/a.out:main+0x20
>>> /home/fd1026/work/skripte/master/parallel/prog/mpi/hello_1/a.out:_start+0x7c
>>> [tyr:29164] *** End of error message ***
>>> ...
>>> 
>>> 
>>> I get the following output if I run the program in "dbx".
>>> 
>>> ...
>>> RTC: Enabling Error Checking...
>>> RTC: Running program...
>>> Write to unallocated (wua) on thread 1:
>>> Attempting to write 1 byte at address 0xffffffff79f04000
>>> t@1 (l@1) stopped in _readdir at 0xffffffff55174da0
>>> 0xffffffff55174da0: _readdir+0x0064:    call     
>>> _PROCEDURE_LINKAGE_TABLE_+0x2380 [PLT] ! 0xffffffff55342a80
>>> (dbx) 
>>> 
>>> 
>>> Hopefully the above output helps to fix the error. Can I provide
>>> anything else? Thank you very much for any help in advance.
>>> 
>>> 
>>> Kind regards
>>> 
>>> Siegmar
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2014/07/24869.php
>> 
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/08/24909.php

Reply via email to