Would you please try r32662? I believe I finally found and fixed this problem.


On Sep 2, 2014, at 6:12 AM, Siegmar Gross 
<siegmar.gr...@informatik.hs-fulda.de> wrote:

> Hi,
> 
> yesterday I installed openmpi-1.9a1r32657 on my machines (Solaris
> 10 Sparc (tyr), Solaris 10 x86_64 (sunpc0), and openSUSE Linux 12.1
> x86_64 (linpc0)) with Sun C 5.12 and gcc-4.9.0.
> 
> I have the following problems with my gcc version. First once more
> my problems with Java and below my problems with C. In my opinion
> I have the same problems as with Sun C.
> 
> 
> 
> Java problem:
> =============
> 
> tyr java 106 mpijavac InitFinalizeMain.java 
> warning: [path] bad path element 
> "/usr/local/openmpi-1.9_64_gcc/lib64/shmem.jar": no such file or directory
> 1 warning
> tyr java 107 mpiexec -np 1 java InitFinalizeMain
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0xffffffff7ea3c7f0, pid=774, tid=2
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0-b132) (build 1.8.0-b132)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b70 mixed mode 
> solaris-sparc compressed oops)
> # Problematic frame:
> # C  [libc.so.1+0x3c7f0]  strlen+0x50
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core 
> dumping, try "ulimit -c unlimited" before 
> starting Java again
> #
> # An error report file with more information is saved as:
> # /home/fd1026/work/skripte/master/parallel/prog/mpi/java/hs_err_pid774.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.sun.com/bugreport/crash.jsp
> # The crash happened outside the Java Virtual Machine in native code.
> # See problematic frame for where to report the bug.
> #
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 0 with PID 774 on node tyr exited on signal 
> 6 (Abort).
> --------------------------------------------------------------------------
> tyr java 108 /usr/local/gdb-7.6.1_64_gcc/bin/gdb 
> /usr/local/openmpi-1.9_64_gcc/bin/mpiexec 
> GNU gdb (GDB) 7.6.1
> Copyright (C) 2013 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "sparc-sun-solaris2.10".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from 
> /export2/prog/SunOS_sparc/openmpi-1.9_64_gcc/bin/orterun...done.
> (gdb) run -np 1 java InitFinalizeMain 
> Starting program: /usr/local/openmpi-1.9_64_gcc/bin/mpiexec -np 1 java 
> InitFinalizeMain
> [Thread debugging using libthread_db enabled]
> [New Thread 1 (LWP 1)]
> [New LWP    2        ]
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0xffffffff7ea3c7f0, pid=791, tid=2
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0-b132) (build 1.8.0-b132)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b70 mixed mode 
> solaris-sparc compressed oops)
> # Problematic frame:
> # C  [libc.so.1+0x3c7f0]  strlen+0x50
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core 
> dumping, try "ulimit -c unlimited" before 
> starting Java again
> #
> # An error report file with more information is saved as:
> # /home/fd1026/work/skripte/master/parallel/prog/mpi/java/hs_err_pid791.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.sun.com/bugreport/crash.jsp
> # The crash happened outside the Java Virtual Machine in native code.
> # See problematic frame for where to report the bug.
> #
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 0 with PID 791 on node tyr exited on signal 
> 6 (Abort).
> --------------------------------------------------------------------------
> [LWP    2         exited]
> [New Thread 2        ]
> [Switching to Thread 1 (LWP 1)]
> sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to 
> satisfy query
> (gdb) bt
> #0  0xffffffff7f6173d0 in rtld_db_dlactivity () from /usr/lib/sparcv9/ld.so.1
> #1  0xffffffff7f6175a8 in rd_event () from /usr/lib/sparcv9/ld.so.1
> #2  0xffffffff7f618950 in lm_delete () from /usr/lib/sparcv9/ld.so.1
> #3  0xffffffff7f6226bc in remove_so () from /usr/lib/sparcv9/ld.so.1
> #4  0xffffffff7f624574 in remove_hdl () from /usr/lib/sparcv9/ld.so.1
> #5  0xffffffff7f61d97c in dlclose_core () from /usr/lib/sparcv9/ld.so.1
> #6  0xffffffff7f61d9d4 in dlclose_intn () from /usr/lib/sparcv9/ld.so.1
> #7  0xffffffff7f61db0c in dlclose () from /usr/lib/sparcv9/ld.so.1
> #8  0xffffffff7ec88e98 in vm_close () from 
> /usr/local/openmpi-1.9_64_gcc/lib64/libopen-pal.so.0
> #9  0xffffffff7ec86478 in lt_dlclose () from 
> /usr/local/openmpi-1.9_64_gcc/lib64/libopen-pal.so.0
> #10 0xffffffff7ecab5fc in ri_destructor (obj=0x1001fe750)
>    at 
> ../../../../openmpi-1.9a1r32657/opal/mca/base/mca_base_component_repository.c:382
> #11 0xffffffff7eca9f38 in opal_obj_run_destructors (object=0x1001fe750)
>    at ../../../../openmpi-1.9a1r32657/opal/class/opal_object.h:446
> #12 0xffffffff7ecaae9c in mca_base_component_repository_release (
>    component=0xffffffff7b122fa8 <mca_oob_tcp_component>)
>    at 
> ../../../../openmpi-1.9a1r32657/opal/mca/base/mca_base_component_repository.c:240
> #13 0xffffffff7ecad19c in mca_base_component_unload (
>    component=0xffffffff7b122fa8 <mca_oob_tcp_component>, output_id=-1)
>    at 
> ../../../../openmpi-1.9a1r32657/opal/mca/base/mca_base_components_close.c:47
> #14 0xffffffff7ecad230 in mca_base_component_close (
>    component=0xffffffff7b122fa8 <mca_oob_tcp_component>, output_id=-1)
>    at 
> ../../../../openmpi-1.9a1r32657/opal/mca/base/mca_base_components_close.c:60
> #15 0xffffffff7ecad304 in mca_base_components_close (output_id=-1, 
>    components=0xffffffff7f146d88 <orte_oob_base_framework+80>, skip=0x0)
>    at 
> ../../../../openmpi-1.9a1r32657/opal/mca/base/mca_base_components_close.c:86
> #16 0xffffffff7ecad26c in mca_base_framework_components_close (
>    framework=0xffffffff7f146d38 <orte_oob_base_framework>, skip=0x0)
>    at 
> ../../../../openmpi-1.9a1r32657/opal/mca/base/mca_base_components_close.c:66
> #17 0xffffffff7efc671c in orte_oob_base_close ()
>    at ../../../../openmpi-1.9a1r32657/orte/mca/oob/base/oob_base_frame.c:98
> #18 0xffffffff7ecc1b28 in mca_base_framework_close (
>    framework=0xffffffff7f146d38 <orte_oob_base_framework>)
>    at ../../../../openmpi-1.9a1r32657/opal/mca/base/mca_base_framework.c:187
> #19 0xffffffff7be07858 in rte_finalize ()
>    at ../../../../../openmpi-1.9a1r32657/orte/mca/ess/hnp/ess_hnp_module.c:857
> #20 0xffffffff7ef337fc in orte_finalize ()
>    at ../../openmpi-1.9a1r32657/orte/runtime/orte_finalize.c:66
> #21 0x00000001000071e0 in orterun (argc=5, argv=0xffffffff7fffe108)
>    at ../../../../openmpi-1.9a1r32657/orte/tools/orterun/orterun.c:1099
> #22 0x0000000100003e60 in main (argc=5, argv=0xffffffff7fffe108)
>    at ../../../../openmpi-1.9a1r32657/orte/tools/orterun/main.c:13
> (gdb) 
> 
> 
> 
> 
> 
> 
> C problem:
> ==========
> 
> tyr small_prog 115 mpiexec -np 1 --host linpc0 init_finalize
> [tyr.informatik.hs-fulda.de:00815] mca_oob_tcp_accept: accept() failed: Error 
> 0 (11).
> Hello!
> tyr small_prog 116 mpiexec -np 1 --host sunpc0 init_finalize
> [tyr.informatik.hs-fulda.de:00819] mca_oob_tcp_accept: accept() failed: Error 
> 0 (11).
> Hello!
> tyr small_prog 117 mpiexec -np 1 --host tyr init_finalize
> select: Interrupted system call
> [tyr:00825] *** Process received signal ***
> [tyr:00825] Signal: Bus Error (10)
> [tyr:00825] Signal code: Invalid address alignment (1)
> [tyr:00825] Failing at address: ffffffff7bd1bfec
> /export2/prog/SunOS_sparc/openmpi-1.9_64_gcc/lib64/libopen-pal.so.0.0.0:opal_backtrace_print+0x2c
> /export2/prog/SunOS_sparc/openmpi-1.9_64_gcc/lib64/libopen-pal.so.0.0.0:0xdd1d8
> /lib/sparcv9/libc.so.1:0xd8b98
> /lib/sparcv9/libc.so.1:0xcc70c
> /lib/sparcv9/libc.so.1:0xcc918
> /export2/prog/SunOS_sparc/openmpi-1.9_64_gcc/lib64/libopen-pal.so.0.0.0:opal_proc_set_name+0x1c
>  [ Signal 10 (BUS)]
> /export2/prog/SunOS_sparc/openmpi-1.9_64_gcc/lib64/openmpi/mca_pmix_native.so:0x103d0
> /export2/prog/SunOS_sparc/openmpi-1.9_64_gcc/lib64/openmpi/mca_ess_pmi.so:0x2fec
> /export2/prog/SunOS_sparc/openmpi-1.9_64_gcc/lib64/libopen-rte.so.0.0.0:orte_init+0x624
> /export2/prog/SunOS_sparc/openmpi-1.9_64_gcc/lib64/libmpi.so.0.0.0:ompi_mpi_init+0x3a8
> /export2/prog/SunOS_sparc/openmpi-1.9_64_gcc/lib64/libmpi.so.0.0.0:PMPI_Init+0x2a8
> /home/fd1026/SunOS/sparc/bin/init_finalize:main+0x10
> /home/fd1026/SunOS/sparc/bin/init_finalize:_start+0x7c
> [tyr:00825] *** End of error message ***
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 0 with PID 825 on node tyr exited on signal 
> 10 (Bus Error).
> --------------------------------------------------------------------------
> tyr small_prog 118 /usr/local/gdb-7.6.1_64_gcc/bin/gdb 
> /usr/local/openmpi-1.9_64_gcc/bin/mpiexec 
> GNU gdb (GDB) 7.6.1
> Copyright (C) 2013 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "sparc-sun-solaris2.10".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from 
> /export2/prog/SunOS_sparc/openmpi-1.9_64_gcc/bin/orterun...done.
> (gdb) run -np 1 --host tyr init_finalize   
> Starting program: /usr/local/openmpi-1.9_64_gcc/bin/mpiexec -np 1 --host tyr 
> init_finalize
> [Thread debugging using libthread_db enabled]
> [New Thread 1 (LWP 1)]
> [New LWP    2        ]
> select: Interrupted system call
> [tyr:00842] *** Process received signal ***
> [tyr:00842] Signal: Bus Error (10)
> [tyr:00842] Signal code: Invalid address alignment (1)
> [tyr:00842] Failing at address: ffffffff7bd1bfec
> /export2/prog/SunOS_sparc/openmpi-1.9_64_gcc/lib64/libopen-pal.so.0.0.0:opal_backtrace_print+0x2c
> /export2/prog/SunOS_sparc/openmpi-1.9_64_gcc/lib64/libopen-pal.so.0.0.0:0xdd1d8
> /lib/sparcv9/libc.so.1:0xd8b98
> /lib/sparcv9/libc.so.1:0xcc70c
> /lib/sparcv9/libc.so.1:0xcc918
> /export2/prog/SunOS_sparc/openmpi-1.9_64_gcc/lib64/libopen-pal.so.0.0.0:opal_proc_set_name+0x1c
>  [ Signal 10 (BUS)]
> /export2/prog/SunOS_sparc/openmpi-1.9_64_gcc/lib64/openmpi/mca_pmix_native.so:0x103d0
> /export2/prog/SunOS_sparc/openmpi-1.9_64_gcc/lib64/openmpi/mca_ess_pmi.so:0x2fec
> /export2/prog/SunOS_sparc/openmpi-1.9_64_gcc/lib64/libopen-rte.so.0.0.0:orte_init+0x624
> /export2/prog/SunOS_sparc/openmpi-1.9_64_gcc/lib64/libmpi.so.0.0.0:ompi_mpi_init+0x3a8
> /export2/prog/SunOS_sparc/openmpi-1.9_64_gcc/lib64/libmpi.so.0.0.0:PMPI_Init+0x2a8
> /home/fd1026/SunOS/sparc/bin/init_finalize:main+0x10
> /home/fd1026/SunOS/sparc/bin/init_finalize:_start+0x7c
> [tyr:00842] *** End of error message ***
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 0 with PID 842 on node tyr exited on signal 
> 10 (Bus Error).
> --------------------------------------------------------------------------
> [LWP    2         exited]
> [New Thread 2        ]
> [Switching to Thread 1 (LWP 1)]
> sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to 
> satisfy query
> (gdb) bt
> #0  0xffffffff7f6173d0 in rtld_db_dlactivity () from /usr/lib/sparcv9/ld.so.1
> #1  0xffffffff7f6175a8 in rd_event () from /usr/lib/sparcv9/ld.so.1
> #2  0xffffffff7f618950 in lm_delete () from /usr/lib/sparcv9/ld.so.1
> #3  0xffffffff7f6226bc in remove_so () from /usr/lib/sparcv9/ld.so.1
> #4  0xffffffff7f624574 in remove_hdl () from /usr/lib/sparcv9/ld.so.1
> #5  0xffffffff7f61d97c in dlclose_core () from /usr/lib/sparcv9/ld.so.1
> #6  0xffffffff7f61d9d4 in dlclose_intn () from /usr/lib/sparcv9/ld.so.1
> #7  0xffffffff7f61db0c in dlclose () from /usr/lib/sparcv9/ld.so.1
> #8  0xffffffff7ec88e98 in vm_close () from 
> /usr/local/openmpi-1.9_64_gcc/lib64/libopen-pal.so.0
> #9  0xffffffff7ec86478 in lt_dlclose () from 
> /usr/local/openmpi-1.9_64_gcc/lib64/libopen-pal.so.0
> #10 0xffffffff7ecab5fc in ri_destructor (obj=0x1001fe750)
>    at 
> ../../../../openmpi-1.9a1r32657/opal/mca/base/mca_base_component_repository.c:382
> #11 0xffffffff7eca9f38 in opal_obj_run_destructors (object=0x1001fe750)
>    at ../../../../openmpi-1.9a1r32657/opal/class/opal_object.h:446
> #12 0xffffffff7ecaae9c in mca_base_component_repository_release (
>    component=0xffffffff7b122fa8 <mca_oob_tcp_component>)
>    at 
> ../../../../openmpi-1.9a1r32657/opal/mca/base/mca_base_component_repository.c:240
> #13 0xffffffff7ecad19c in mca_base_component_unload (
>    component=0xffffffff7b122fa8 <mca_oob_tcp_component>, output_id=-1)
>    at 
> ../../../../openmpi-1.9a1r32657/opal/mca/base/mca_base_components_close.c:47
> #14 0xffffffff7ecad230 in mca_base_component_close (
>    component=0xffffffff7b122fa8 <mca_oob_tcp_component>, output_id=-1)
>    at 
> ../../../../openmpi-1.9a1r32657/opal/mca/base/mca_base_components_close.c:60
> #15 0xffffffff7ecad304 in mca_base_components_close (output_id=-1, 
>    components=0xffffffff7f146d88 <orte_oob_base_framework+80>, skip=0x0)
>    at 
> ../../../../openmpi-1.9a1r32657/opal/mca/base/mca_base_components_close.c:86
> #16 0xffffffff7ecad26c in mca_base_framework_components_close (
>    framework=0xffffffff7f146d38 <orte_oob_base_framework>, skip=0x0)
>    at 
> ../../../../openmpi-1.9a1r32657/opal/mca/base/mca_base_components_close.c:66
> #17 0xffffffff7efc671c in orte_oob_base_close ()
>    at ../../../../openmpi-1.9a1r32657/orte/mca/oob/base/oob_base_frame.c:98
> #18 0xffffffff7ecc1b28 in mca_base_framework_close (
>    framework=0xffffffff7f146d38 <orte_oob_base_framework>)
>    at ../../../../openmpi-1.9a1r32657/opal/mca/base/mca_base_framework.c:187
> #19 0xffffffff7be07858 in rte_finalize ()
>    at ../../../../../openmpi-1.9a1r32657/orte/mca/ess/hnp/ess_hnp_module.c:857
> #20 0xffffffff7ef337fc in orte_finalize ()
>    at ../../openmpi-1.9a1r32657/orte/runtime/orte_finalize.c:66
> #21 0x00000001000071e0 in orterun (argc=6, argv=0xffffffff7fffe0f8)
>    at ../../../../openmpi-1.9a1r32657/orte/tools/orterun/orterun.c:1099
> #22 0x0000000100003e60 in main (argc=6, argv=0xffffffff7fffe0f8)
>    at ../../../../openmpi-1.9a1r32657/orte/tools/orterun/main.c:13
> (gdb) 
> 
> 
> 
> I would be grateful, if somebody can fix the problem. Thank you
> very much for any help in advance.
> 
> 
> Kind regards
> 
> Siegmar
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/09/25218.php

Reply via email to