Hi,

thank you very much to everybody who tried to solve my bus
error problem on Solaris 10 Sparc. I thought that you found
and fixed it, so that I installed openmpi-1.8.2rc4r32485 on
my machines (Solaris 10 Sparc (tyr), Solaris 10 x86_64 (sunpc1),
openSUSE Linux 12.1 x86_64 (linpc1)) with gcc-4.9.0. A small
program works on my x86_64 architectures, but still breaks
with a bus error on my Sparc system.

linpc1 fd1026 106 mpiexec -np 1 init_finalize
Hello!
linpc1 fd1026 106 exit
logout
tyr small_prog 113 ssh sunpc1
sunpc1 fd1026 101 mpiexec -np 1 init_finalize
Hello!
sunpc1 fd1026 102 exit
logout
tyr small_prog 114 mpiexec -np 1 init_finalize
[tyr:21109] *** Process received signal ***
[tyr:21109] Signal: Bus Error (10)
...


gdb shows the following backtrace.

tyr small_prog 122 /usr/local/gdb-7.6.1_64_gcc/bin/gdb 
/usr/local/openmpi-1.8.2_64_gcc/bin/mpiexec
GNU gdb (GDB) 7.6.1
...
(gdb) run -np 1 init_finalize
Starting program: /usr/local/openmpi-1.8.2_64_gcc/bin/mpiexec -np 1 
init_finalize
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP    2        ]
[tyr:21158] *** Process received signal ***
[tyr:21158] Signal: Bus Error (10)
[tyr:21158] Signal code: Invalid address alignment (1)
[tyr:21158] Failing at address: ffffffff7fffd224
/export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_backtrace_print+0x2c
/export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:0xcd130
/lib/sparcv9/libc.so.1:0xd8b98
/lib/sparcv9/libc.so.1:0xcc70c
/lib/sparcv9/libc.so.1:0xcc918
/export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_db_hash.so:0x3ee8
 [ Signal 10 (BUS)]
/export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_db_base_store+0xc8
/export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_util_decode_pidmap+0x798
/export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_util_nidmap_init+0x3cc
/export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_ess_env.so:0x226c
/export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_init+0x308
/export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:ompi_mpi_init+0x31c
/export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:MPI_Init+0x2a8
/home/fd1026/SunOS/sparc/bin/init_finalize:main+0x10
/home/fd1026/SunOS/sparc/bin/init_finalize:_start+0x7c
[tyr:21158] *** End of error message ***
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 21158 on node tyr exited on signal 
10 (Bus Error).
--------------------------------------------------------------------------
[LWP    2         exited]
[New Thread 2        ]
[Switching to Thread 1 (LWP 1)]
sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to satisfy 
query
(gdb) bt
#0  0xffffffff7f6173d0 in rtld_db_dlactivity () from /usr/lib/sparcv9/ld.so.1
#1  0xffffffff7f6175a8 in rd_event () from /usr/lib/sparcv9/ld.so.1
#2  0xffffffff7f618950 in lm_delete () from /usr/lib/sparcv9/ld.so.1
#3  0xffffffff7f6226bc in remove_so () from /usr/lib/sparcv9/ld.so.1
#4  0xffffffff7f624574 in remove_hdl () from /usr/lib/sparcv9/ld.so.1
#5  0xffffffff7f61d97c in dlclose_core () from /usr/lib/sparcv9/ld.so.1
#6  0xffffffff7f61d9d4 in dlclose_intn () from /usr/lib/sparcv9/ld.so.1
#7  0xffffffff7f61db0c in dlclose () from /usr/lib/sparcv9/ld.so.1
#8  0xffffffff7ec7748c in vm_close () from 
/usr/local/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6
#9  0xffffffff7ec74a6c in lt_dlclose () from 
/usr/local/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6
#10 0xffffffff7ec99b90 in ri_destructor (obj=0x1001ead30)
    at 
../../../../openmpi-1.8.2rc4r32485/opal/mca/base/mca_base_component_repository.c:391
#11 0xffffffff7ec984a8 in opal_obj_run_destructors (object=0x1001ead30)
    at ../../../../openmpi-1.8.2rc4r32485/opal/class/opal_object.h:446
#12 0xffffffff7ec9940c in mca_base_component_repository_release (
    component=0xffffffff7b023df0 <mca_oob_tcp_component>)
    at 
../../../../openmpi-1.8.2rc4r32485/opal/mca/base/mca_base_component_repository.c:244
#13 0xffffffff7ec9b754 in mca_base_component_unload (
    component=0xffffffff7b023df0 <mca_oob_tcp_component>, output_id=-1)
    at 
../../../../openmpi-1.8.2rc4r32485/opal/mca/base/mca_base_components_close.c:47
#14 0xffffffff7ec9b7e8 in mca_base_component_close (
    component=0xffffffff7b023df0 <mca_oob_tcp_component>, output_id=-1)
    at 
../../../../openmpi-1.8.2rc4r32485/opal/mca/base/mca_base_components_close.c:60
#15 0xffffffff7ec9b8bc in mca_base_components_close (output_id=-1, 
    components=0xffffffff7f12b930 <orte_oob_base_framework+80>, skip=0x0)
    at 
../../../../openmpi-1.8.2rc4r32485/opal/mca/base/mca_base_components_close.c:86
#16 0xffffffff7ec9b824 in mca_base_framework_components_close (
    framework=0xffffffff7f12b8e0 <orte_oob_base_framework>, skip=0x0)
    at 
../../../../openmpi-1.8.2rc4r32485/opal/mca/base/mca_base_components_close.c:66
#17 0xffffffff7efae21c in orte_oob_base_close ()
    at ../../../../openmpi-1.8.2rc4r32485/orte/mca/oob/base/oob_base_frame.c:94
#18 0xffffffff7ecb28cc in mca_base_framework_close (
    framework=0xffffffff7f12b8e0 <orte_oob_base_framework>)
    at ../../../../openmpi-1.8.2rc4r32485/opal/mca/base/mca_base_framework.c:187
#19 0xffffffff7bf078c0 in rte_finalize ()
    at 
../../../../../openmpi-1.8.2rc4r32485/orte/mca/ess/hnp/ess_hnp_module.c:858
#20 0xffffffff7ef30a44 in orte_finalize ()
    at ../../openmpi-1.8.2rc4r32485/orte/runtime/orte_finalize.c:65
#21 0x00000001000070c4 in orterun (argc=4, argv=0xffffffff7fffe0d8)
    at ../../../../openmpi-1.8.2rc4r32485/orte/tools/orterun/orterun.c:1096
#22 0x0000000100003d70 in main (argc=4, argv=0xffffffff7fffe0d8)
    at ../../../../openmpi-1.8.2rc4r32485/orte/tools/orterun/main.c:13
(gdb) 


Is this a new problem? I would be grateful if somebody could
fix it. Thank you very much for any help in advance.

Kind regards

Siegmar

Reply via email to