Hi,
I tried mpiJava on a 32-bit installation of openmpi-1.9a1r27361.
Why doesn't "mpiexec" start a process on my local machine (it
is not a matter of Java, because I have the same behaviour when
I use "hostname")?
tyr java 133 mpiexec -np 3 -host tyr,sunpc4,sunpc1 \
java -cp $HOME/mpi_classfiles HelloMainWithBarrier
Process 0 of 3 running on sunpc4.informatik.hs-fulda.de
Process 1 of 3 running on sunpc4.informatik.hs-fulda.de
Process 2 of 3 running on sunpc1
...
tyr small_prog 142 mpiexec -np 3 -host tyr,sunpc4,sunpc1 hostname
sunpc1
sunpc4.informatik.hs-fulda.de
sunpc4.informatik.hs-fulda.de
The command breaks if I add a Linux machine.
tyr java 110 mpiexec -np 3 -host tyr,sunpc4,linpc4 \
java -cp $HOME/mpi_classfiles HelloMainWithBarrier
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
mca_base_open failed
--> Returned value -2 instead of OPAL_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
opal_init failed
--> Returned value Out of resource (-2) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: orte_init failed
--> Returned "Out of resource" (-2) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[linpc4:27369] Local abort before MPI_INIT completed successfully;
not able to aggregate error messages, and not able to guarantee
that all other processes were killed!
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status,
thus causing
the job to be terminated. The first process to do so was:
Process name: [[21095,1],2]
Exit code: 1
--------------------------------------------------------------------------
tyr java 111 which mpijavac
/usr/local/openmpi-1.9_32_cc/bin/mpijavac
tyr java 112 more /usr/local/openmpi-1.9_32_cc/bin/mpijavac
#!/usr/bin/env perl
# WARNING: DO NOT EDIT THE mpijava.pl FILE AS IT IS GENERATED!
# MAKE ALL CHANGES IN mpijava.pl.in
# Copyright (c) 2011 Cisco Systems, Inc. All rights reserved.
# Copyright (c) 2012 Oracle and/or its affiliates. All rights reserved.
use strict;
# The main purpose of this wrapper compiler is to check for
# and adjust the Java class path to include the OMPI classes
# in mpi.jar. The user may have specified a class path on
# our cmd line, or it may be in the environment, so we have
# to check for both. We also need to be careful not to
# just override the class path as it probably includes classes
# they need for their application! It also may already include
# the path to mpi.jar, and while it doesn't hurt anything, we
# don't want to include our class path more than once to avoid
# user astonishment
# Let the build system provide us with some critical values
my $my_compiler = "/usr/local/jdk1.7.0_07/bin/javac";
my $ompi_classpath = "/usr/local/openmpi-1.9_32_cc/lib/mpi.jar";
# globals
my $showme_arg = 0;
my $verbose = 0;
my $my_arg;
...
All libraries are available.
tyr java 113 ldd /usr/local/jdk1.7.0_07/bin/javac
libthread.so.1 => /usr/lib/libthread.so.1
libjli.so =>
/export2/prog/SunOS_sparc/jdk1.7.0_07/bin/../jre/lib/sparc/jli/libjli.so
libdl.so.1 => /usr/lib/libdl.so.1
libc.so.1 => /usr/lib/libc.so.1
libm.so.2 => /usr/lib/libm.so.2
/platform/SUNW,A70/lib/libc_psr.so.1
tyr java 114 ssh sunpc4 ldd /usr/local/jdk1.7.0_07/bin/javac
libthread.so.1 => /usr/lib/libthread.so.1
libjli.so =>
/usr/local/jdk1.7.0_07/bin/../jre/lib/i386/jli/libjli.so
libdl.so.1 => /usr/lib/libdl.so.1
libc.so.1 => /usr/lib/libc.so.1
libm.so.2 => /usr/lib/libm.so.2
tyr java 115 ssh linpc4 ldd /usr/local/jdk1.7.0_07/bin/javac
linux-gate.so.1 => (0xffffe000)
libpthread.so.0 => /lib/libpthread.so.0 (0xf77b2000)
libjli.so => /usr/local/jdk1.7.0_07/bin/../jre/lib/i386/jli/libjli.so
(0xf779d000)
libdl.so.2 => /lib/libdl.so.2 (0xf7798000)
libc.so.6 => /lib/libc.so.6 (0xf762b000)
/lib/ld-linux.so.2 (0xf77ce000)
I don't have any errors in the log files except the error for nfs.
tyr openmpi-1.9-Linux.x86_64.32_cc 136 ls log.*
log.configure.Linux.x86_64.32_cc log.make-install.Linux.x86_64.32_cc
log.make-check.Linux.x86_64.32_cc log.make.Linux.x86_64.32_cc
tyr openmpi-1.9-Linux.x86_64.32_cc 137 grep "Error 1" log.*
log.make-check.Linux.x86_64.32_cc:make[3]: *** [check-TESTS] Error 1
log.make-check.Linux.x86_64.32_cc:make[1]: *** [check-recursive] Error 1
log.make-check.Linux.x86_64.32_cc:make: *** [check-recursive] Error 1
...
SUPPORT: OMPI Test failed: opal_path_nfs() (1 of 32 failed)
FAIL: opal_path_nfs
========================================================
1 of 2 tests failed
Please report to http://www.open-mpi.org/community/help/
========================================================
make[3]: *** [check-TESTS] Error 1
...
It doesn't help to build the class files on Linux (which should be
independent of the architecture anyway).
tyr java 131 ssh linpc4
linpc4 fd1026 98 cd .../prog/mpi/java
linpc4 java 99 make clean
rm -f /home/fd1026/mpi_classfiles/HelloMainWithBarrier.class \
/home/fd1026/mpi_classfiles/HelloMainWithoutBarrier.class
linpc4 java 100 make
mpijavac -d /home/fd1026/mpi_classfiles HelloMainWithBarrier.java
mpijavac -d /home/fd1026/mpi_classfiles HelloMainWithoutBarrier.java
linpc4 java 101 mpiexec -np 3 -host linpc4 \
java -cp $HOME/mpi_classfiles HelloMainWithBarrier
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
mca_base_open failed
--> Returned value -2 instead of OPAL_SUCCESS
...
Has anybody else this problem as well? Do you know a solution?
Thank you very much for any help in advance.
Kind regards
Siegmar