Hi, yesterday I installed openmpi-1.8.2rc4r32485 on my machines (Solaris 10 Sparc (tyr), Solaris 10 x86_64 (sunpc1), openSUSE Linux 12.1 x86_64 (linpc1)) with Sun C 5.12. A small Java program breaks with SIGSEV on my Solaris systems.
tyr java 118 ssh linpc1 linpc1 fd1026 101 mpiexec -np 1 java InitFinalizeMain Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /usr/local/openmpi-1.8.2_64_cc/lib64/libmpi_java.so.1.2.0 which might have disabled stack guard. The VM will try to fix the stack guard now. It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'. Hello! linpc1 fd1026 102 exit logout tyr java 119 ssh sunpc1 sunpc1 fd1026 104 mpiexec -np 1 java InitFinalizeMain # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0xfffffd7fff1d77f0, pid=24042, tid=2 ... tyr java 121 mpiexec -np 1 java InitFinalizeMain # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0xffffffff7ea3c7f0, pid=21379, tid=2 ... gdb shows the following backtrace. tyr java 124 /usr/local/gdb-7.6.1_64_gcc/bin/gdb /usr/local/openmpi-1.8.2_64_cc/bin/mpiexec GNU gdb (GDB) 7.6.1 ... (gdb) run -np 1 java InitFinalizeMain Starting program: /usr/local/openmpi-1.8.2_64_cc/bin/mpiexec -np 1 java InitFinalizeMain [Thread debugging using libthread_db enabled] [New Thread 1 (LWP 1)] [New LWP 2 ] # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0xffffffff7ea3c7f0, pid=21399, tid=2 # # JRE version: Java(TM) SE Runtime Environment (8.0-b132) (build 1.8.0-b132) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b70 mixed mode solaris-sparc compressed oops) # Problematic frame: # C [libc.so.1+0x3c7f0] strlen+0x50 # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # /home/fd1026/work/skripte/master/parallel/prog/mpi/java/hs_err_pid21399.log # # If you would like to submit a bug report, please visit: # http://bugreport.sun.com/bugreport/crash.jsp # The crash happened outside the Java Virtual Machine in native code. # See problematic frame for where to report the bug. # -------------------------------------------------------------------------- mpiexec noticed that process rank 0 with PID 21399 on node tyr exited on signal 6 (Abort). -------------------------------------------------------------------------- [LWP 2 exited] [New Thread 2 ] [Switching to Thread 1 (LWP 1)] sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to satisfy query (gdb) bt #0 0xffffffff7f6173d0 in rtld_db_dlactivity () from /usr/lib/sparcv9/ld.so.1 #1 0xffffffff7f6175a8 in rd_event () from /usr/lib/sparcv9/ld.so.1 #2 0xffffffff7f618950 in lm_delete () from /usr/lib/sparcv9/ld.so.1 #3 0xffffffff7f6226bc in remove_so () from /usr/lib/sparcv9/ld.so.1 #4 0xffffffff7f624574 in remove_hdl () from /usr/lib/sparcv9/ld.so.1 #5 0xffffffff7f61d97c in dlclose_core () from /usr/lib/sparcv9/ld.so.1 #6 0xffffffff7f61d9d4 in dlclose_intn () from /usr/lib/sparcv9/ld.so.1 #7 0xffffffff7f61db0c in dlclose () from /usr/lib/sparcv9/ld.so.1 #8 0xffffffff7e8cb348 in vm_close () from /usr/local/openmpi-1.8.2_64_cc/lib64/libopen-pal.so.6 #9 0xffffffff7e8c8634 in lt_dlclose () from /usr/local/openmpi-1.8.2_64_cc/lib64/libopen-pal.so.6 #10 0xffffffff7e91edcc in ri_destructor (obj=0xff) at ../../../../openmpi-1.8.2rc4r32485/opal/mca/base/mca_base_component_repository.c :391 #11 0xffffffff7e91c5a0 in opal_obj_run_destructors (object=0xffffff7c701d00ff) at ../../../../openmpi-1.8.2rc4r32485/opal/class/opal_object.h:446 #12 0xffffffff7e91e61c in mca_base_component_repository_release (component=0x10ff) at ../../../../openmpi-1.8.2rc4r32485/opal/mca/base/mca_base_component_repository.c :244 #13 0xffffffff7e924c78 in mca_base_component_unload (component=0xffffff7f73c63800, output_id=67583) at ../../../../openmpi-1.8.2rc4r32485/opal/mca/base/mca_base_components_close.c:47 #14 0xffffffff7e924d1c in mca_base_component_close (component=0xffffff0000000100, output_id=268480767) at ../../../../openmpi-1.8.2rc4r32485/opal/mca/base/mca_base_components_close.c:60 #15 0xffffffff7e924e2c in mca_base_components_close (output_id=1947894015, components=0xffffff7f501368ff, skip=0x2ff) at ../../../../openmpi-1.8.2rc4r32485/opal/mca/base/mca_base_components_close.c:86 #16 0xffffffff7e924d6c in mca_base_framework_components_close (framework=0xffffff7d7455d4ff, skip=0xffffff7f200a90ff) at ../../../../openmpi-1.8.2rc4r32485/opal/mca/base/mca_base_components_close.c:68 #17 0xffffffff7ee1d7c8 in orte_oob_base_close () at ../../../../openmpi-1.8.2rc4r32485/orte/mca/oob/base/oob_base_frame.c:94 #18 0xffffffff7e954ac0 in mca_base_framework_close (framework=0xffffff0000004b00) at ../../../../openmpi-1.8.2rc4r32485/opal/mca/base/mca_base_framework.c:187 #19 0xffffffff7be139fc in rte_finalize () at ../../../../../openmpi-1.8.2rc4r32485/orte/mca/ess/hnp/ess_hnp_module.c:858 #20 0xffffffff7ec38274 in orte_finalize () at ../../openmpi-1.8.2rc4r32485/orte/runtime/orte_finalize.c:65 #21 0x000000010000ddf0 in orterun (argc=3327, argv=0x0) at ../../../../openmpi-1.8.2rc4r32485/orte/tools/orterun/orterun.c:1096 #22 0x0000000100004614 in main (argc=255, argv=0xffffff7f078ce800) at ../../../../openmpi-1.8.2rc4r32485/orte/tools/orterun/main.c:13 (gdb) It seems that I have now the same problem for Sun C and Java which I reported for gcc and C. The C version of my small program works fine with Sun C. tyr small_prog 129 mpiexec -np 1 init_finalize Hello! tyr small_prog 130 I would be grateful if somebody could fix th problem. Thank you very much for any help in advance. Kind regards Siegmar