Hi,

today I installed openmpi-1.8.2rc2r32288 on my machines (Solaris 10
Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with
Sun C 5.12 and gcc-4.9.0. Unfortunately I have problems with both
compilers on "Solaris 10 Sparc". My small program works as expected
on "Solaris 10 x86_64" and Linux.

Problem with Sun C 5.12:
------------------------

tyr hello_1 128 which mpicc
/usr/local/openmpi-1.8.2_64_cc/bin/mpicc
tyr hello_1 129 ompi_info | grep MPI:
                Open MPI: 1.8.2rc2r32288
tyr hello_1 130 mpicc hello_1_mpi.c 
tyr hello_1 131 mpiexec -np 2 a.out 
Process 0 of 2 running on tyr.informatik.hs-fulda.de

Now 1 slave tasks are sending greetings.

Process 1 of 2 running on tyr.informatik.hs-fulda.de
ld.so.1: a.out: fatal: relocation error:
  file /usr/local/openmpi-1.8.2_64_cc/lib64/openmpi/:
  symbol alloca: referenced symbol not found
ld.so.1: a.out: fatal: relocation error:
  file /usr/local/openmpi-1.8.2_64_cc/lib64/openmpi/:
  symbol alloca: referenced symbol not found
----------------------------------------------------------------------
mpiexec noticed that process rank 1 with PID 28377 on node tyr exited
  on signal 9 (Killed).
----------------------------------------------------------------------
tyr hello_1 132 



I have also a problem with the Java interface on Solaris Sparc and
x86_64 with mainly the same error message. 

tyr java 150 mpijavac InitFinalizeMain.java 
tyr java 151 mpiexec -np 1 java InitFinalizeMain
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0xffffffff7ea3c7f0, pid=28585, tid=2
#
# JRE version: Java(TM) SE Runtime Environment (8.0-b132) (build 1.8.0-b132)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b70 mixed mode solaris-sparc 
compressed oops)
# Problematic frame:
# C  [libc.so.1+0x3c7f0]  strlen+0x50
#
# Failed to write core dump. Core dumps have been disabled. To enable core 
dumping, try "ulimit -c unlimited" 
before starting Java again
#
# An error report file with more information is saved as:
# /home/fd1026/work/skripte/master/parallel/prog/mpi/java/hs_err_pid28585.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 28585 on node tyr exited on signal 
6 (Abort).
--------------------------------------------------------------------------
tyr java 152 



It works on Linux, but displays a warning.

tyr java 153 ssh linpc1
linpc1 fd1026 101 cd /home/fd1026/work/skripte/master/parallel/prog/mpi/java
linpc1 java 102 mpijavac InitFinalizeMain.java 
linpc1 java 103 mpiexec -np 1 java InitFinalizeMain
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library 
/usr/local/openmpi-1.8.2_64_cc/lib64/libmpi_java.so.1.2.0 which might have 
disabled stack guard. The VM will try 
to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', 
or link it with '-z noexecstack'.
Hello!
linpc1 java 104 



Problem with gcc-4.9.0:
-----------------------

tyr hello_1 104 which mpicc
/usr/local/openmpi-1.8.2_64_gcc/bin/mpicc
tyr hello_1 105 ompi_info | grep MPI:
                Open MPI: 1.8.2rc2r32288
tyr hello_1 106 mpicc hello_1_mpi.c 
tyr hello_1 107 mpiexec -np 2 a.out 
[tyr:28540] *** Process received signal ***
[tyr:28540] Signal: Bus Error (10)
[tyr:28540] Signal code: Invalid address alignment (1)
[tyr:28540] Failing at address: ffffffff7fffd1c4
/export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_backtrace_print+0x2c
/export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:0xccfd0
/lib/sparcv9/libc.so.1:0xd8b98
/lib/sparcv9/libc.so.1:0xcc70c
/lib/sparcv9/libc.so.1:0xcc918
/export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_db_hash.so:0x3ee8
 [ Signal 10 (BUS)]
/export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_db_base_store+0xc8
/export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_util_decode_pidmap+0x798
/export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_util_nidmap_init+0x3cc
/export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_ess_env.so:0x226c
/export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_init+0x308
/export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:ompi_mpi_init+0x31c
/export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:PMPI_Init+0x2a8
/home/fd1026/work/skripte/master/parallel/prog/mpi/hello_1/a.out:main+0x20
/home/fd1026/work/skripte/master/parallel/prog/mpi/hello_1/a.out:_start+0x7c
[tyr:28540] *** End of error message ***
[tyr:28542] *** Process received signal ***
[tyr:28542] Signal: Bus Error (10)
[tyr:28542] Signal code: Invalid address alignment (1)
[tyr:28542] Failing at address: ffffffff7fffd1c4
/export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_backtrace_print+0x2c
/export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:0xccfd0
/lib/sparcv9/libc.so.1:0xd8b98
/lib/sparcv9/libc.so.1:0xcc70c
/lib/sparcv9/libc.so.1:0xcc918
/export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_db_hash.so:0x3ee8
 [ Signal 10 (BUS)]
/export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_db_base_store+0xc8
/export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_util_decode_pidmap+0x8f8
/export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_util_nidmap_init+0x3cc
/export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_ess_env.so:0x226c
/export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_init+0x308
/export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:ompi_mpi_init+0x31c
/export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:PMPI_Init+0x2a8
/home/fd1026/work/skripte/master/parallel/prog/mpi/hello_1/a.out:main+0x20
/home/fd1026/work/skripte/master/parallel/prog/mpi/hello_1/a.out:_start+0x7c
[tyr:28542] *** End of error message ***
--------------------------------------------------------------------------
mpiexec noticed that process rank 1 with PID 28542 on node tyr exited on signal 
10 (Bus Error).
--------------------------------------------------------------------------
tyr hello_1 108 




I would be grateful, if somebody could solve the problems. Please let
me know if I can provide any other information.


Kind regards

Siegmar

Reply via email to