Hi Takahiro, > I forgot to follow the previous report, sorry. > The patch I suggested is not included in Open MPI 1.8.2. > The backtrace Siegmar reported points the problem that I fixed > in the patch. > > http://www.open-mpi.org/community/lists/users/2014/08/24968.php > > Siegmar: > Could you try my patch again?
Yes, your patch solves the bus error in openmpi-1.8.2 and openmpi-1.8.3a1r32641. Thank you very much for your help once more Siegmar > Ralph (or someone committer): > Open MPI 1.8 needs custom patch that I posted. See my previous mail. > Could you review it and commit it to v1.8 branch? > > Regards, > Takahiro > > > Hi, > > > > yesterday I installed openmpi-1.8.2 on my machines (Solaris 10 Sparc > > (tyr), Solaris 10 x86_64 (sunpc0), and openSUSE Linux 12.1 x86_64 > > (linpc0)) with gcc-4.9.0. A small program works on some machines, > > but breaks with a bus error on Solaris 10 Sparc. > > > > > > tyr small_prog 118 which mpicc > > /usr/local/openmpi-1.8.2_64_gcc/bin/mpicc > > tyr small_prog 119 ompi_info | grep MPI: > > Open MPI: 1.8.2 > > tyr small_prog 120 mpiexec -np 1 --host linpc0 init_finalize > > Hello! > > tyr small_prog 121 mpiexec -np 1 --host sunpc0 init_finalize > > Hello! > > tyr small_prog 122 mpiexec -np 1 --host tyr init_finalize > > [tyr:28081] *** Process received signal *** > > [tyr:28081] Signal: Bus Error (10) > > [tyr:28081] Signal code: Invalid address alignment (1) > > [tyr:28081] Failing at address: ffffffff7fffd304 > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_b acktrace_print+0x2c > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:0xcd11 8 > > /lib/sparcv9/libc.so.1:0xd8b98 > > /lib/sparcv9/libc.so.1:0xcc70c > > /lib/sparcv9/libc.so.1:0xcc918 > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_db_hash.so:0x3e e8 [ Signal 10 (BUS)] > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_d b_base_store+0xc8 > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_u til_decode_pidmap+0x798 > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_u til_nidmap_init+0x3cc > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_ess_env.so:0x22 6c > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_i nit+0x308 > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:ompi_mpi_in it+0x31c > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:MPI_Init+0x 2a8 > > /home/fd1026/SunOS/sparc/bin/init_finalize:main+0x10 > > /home/fd1026/SunOS/sparc/bin/init_finalize:_start+0x7c > > [tyr:28081] *** End of error message *** > > -------------------------------------------------------------------------- > > mpiexec noticed that process rank 0 with PID 28081 on node tyr exited on signal 10 (Bus Error). > > -------------------------------------------------------------------------- > > tyr small_prog 123 > > > > > > > > gdb shows the following backtrace. > > > > tyr small_prog 123 /usr/local/gdb-7.6.1_64_gcc/bin/gdb /usr/local/openmpi-1.8.2_64_gcc/bin/mpiexec > > GNU gdb (GDB) 7.6.1 > > Copyright (C) 2013 Free Software Foundation, Inc. > > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> > > This is free software: you are free to change and redistribute it. > > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > > and "show warranty" for details. > > This GDB was configured as "sparc-sun-solaris2.10". > > For bug reporting instructions, please see: > > <http://www.gnu.org/software/gdb/bugs/>... > > Reading symbols from /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/bin/orterun...done. > > (gdb) run -np 1 --host tyr init_finalize > > Starting program: /usr/local/openmpi-1.8.2_64_gcc/bin/mpiexec -np 1 --host tyr init_finalize > > [Thread debugging using libthread_db enabled] > > [New Thread 1 (LWP 1)] > > [New LWP 2 ] > > [tyr:28099] *** Process received signal *** > > [tyr:28099] Signal: Bus Error (10) > > [tyr:28099] Signal code: Invalid address alignment (1) > > [tyr:28099] Failing at address: ffffffff7fffd244 > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_b acktrace_print+0x2c > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:0xcd11 8 > > /lib/sparcv9/libc.so.1:0xd8b98 > > /lib/sparcv9/libc.so.1:0xcc70c > > /lib/sparcv9/libc.so.1:0xcc918 > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_db_hash.so:0x3e e8 [ Signal 10 (BUS)] > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_d b_base_store+0xc8 > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_u til_decode_pidmap+0x798 > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_u til_nidmap_init+0x3cc > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_ess_env.so:0x22 6c > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_i nit+0x308 > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:ompi_mpi_in it+0x31c > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:MPI_Init+0x 2a8 > > /home/fd1026/SunOS/sparc/bin/init_finalize:main+0x10 > > /home/fd1026/SunOS/sparc/bin/init_finalize:_start+0x7c > > [tyr:28099] *** End of error message *** > > -------------------------------------------------------------------------- > > mpiexec noticed that process rank 0 with PID 28099 on node tyr exited on signal 10 (Bus Error). > > -------------------------------------------------------------------------- > > [LWP 2 exited] > > [New Thread 2 ] > > [Switching to Thread 1 (LWP 1)] > > sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to satisfy query > > (gdb) bt > > #0 0xffffffff7f6173d0 in rtld_db_dlactivity () from /usr/lib/sparcv9/ld.so.1 > > #1 0xffffffff7f6175a8 in rd_event () from /usr/lib/sparcv9/ld.so.1 > > #2 0xffffffff7f618950 in lm_delete () from /usr/lib/sparcv9/ld.so.1 > > #3 0xffffffff7f6226bc in remove_so () from /usr/lib/sparcv9/ld.so.1 > > #4 0xffffffff7f624574 in remove_hdl () from /usr/lib/sparcv9/ld.so.1 > > #5 0xffffffff7f61d97c in dlclose_core () from /usr/lib/sparcv9/ld.so.1 > > #6 0xffffffff7f61d9d4 in dlclose_intn () from /usr/lib/sparcv9/ld.so.1 > > #7 0xffffffff7f61db0c in dlclose () from /usr/lib/sparcv9/ld.so.1 > > #8 0xffffffff7ec77474 in vm_close () from /usr/local/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6 > > #9 0xffffffff7ec74a54 in lt_dlclose () > > from /usr/local/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6 > > #10 0xffffffff7ec99b78 in ri_destructor (obj=0x1001eada0) > > at ../../../../openmpi-1.8.2/opal/mca/base/mca_base_component_repository.c:391 > > #11 0xffffffff7ec98490 in opal_obj_run_destructors (object=0x1001eada0) > > at ../../../../openmpi-1.8.2/opal/class/opal_object.h:446 > > #12 0xffffffff7ec993f4 in mca_base_component_repository_release ( > > component=0xffffffff7b023ef0 <mca_oob_tcp_component>) > > at ../../../../openmpi-1.8.2/opal/mca/base/mca_base_component_repository.c:244 > > #13 0xffffffff7ec9b73c in mca_base_component_unload ( > > component=0xffffffff7b023ef0 <mca_oob_tcp_component>, output_id=-1) > > at ../../../../openmpi-1.8.2/opal/mca/base/mca_base_components_close.c:47 > > #14 0xffffffff7ec9b7d0 in mca_base_component_close ( > > component=0xffffffff7b023ef0 <mca_oob_tcp_component>, output_id=-1) > > at ../../../../openmpi-1.8.2/opal/mca/base/mca_base_components_close.c:60 > > #15 0xffffffff7ec9b8a4 in mca_base_components_close (output_id=-1, > > components=0xffffffff7f12b030 <orte_oob_base_framework+80>, skip=0x0) > > at ../../../../openmpi-1.8.2/opal/mca/base/mca_base_components_close.c:86 > > #16 0xffffffff7ec9b80c in mca_base_framework_components_close ( > > framework=0xffffffff7f12afe0 <orte_oob_base_framework>, skip=0x0) > > at ../../../../openmpi-1.8.2/opal/mca/base/mca_base_components_close.c:66 > > #17 0xffffffff7efae0e8 in orte_oob_base_close () > > at ../../../../openmpi-1.8.2/orte/mca/oob/base/oob_base_frame.c:94 > > #18 0xffffffff7ecb28b4 in mca_base_framework_close ( > > framework=0xffffffff7f12afe0 <orte_oob_base_framework>) > > at ../../../../openmpi-1.8.2/opal/mca/base/mca_base_framework.c:187 > > #19 0xffffffff7bf078c0 in rte_finalize () > > at ../../../../../openmpi-1.8.2/orte/mca/ess/hnp/ess_hnp_module.c:858 > > #20 0xffffffff7ef30924 in orte_finalize () at ../../openmpi-1.8.2/orte/runtime/orte_finalize.c:65 > > #21 0x00000001000070c4 in orterun (argc=6, argv=0xffffffff7fffe0e8) > > at ../../../../openmpi-1.8.2/orte/tools/orterun/orterun.c:1096 > > #22 0x0000000100003d70 in main (argc=6, argv=0xffffffff7fffe0e8) > > at ../../../../openmpi-1.8.2/orte/tools/orterun/main.c:13 > > (gdb) > > > > > > I would be grateful, if somebody can fix the problem. Thank you > > very much for any help in advance. > > > > > > Kind regards > > > > Siegmar