Hi, I tried mpiJava on a 32-bit installation of openmpi-1.9a1r27361. Why doesn't "mpiexec" start a process on my local machine (it is not a matter of Java, because I have the same behaviour when I use "hostname")?
tyr java 133 mpiexec -np 3 -host tyr,sunpc4,sunpc1 \ java -cp $HOME/mpi_classfiles HelloMainWithBarrier Process 0 of 3 running on sunpc4.informatik.hs-fulda.de Process 1 of 3 running on sunpc4.informatik.hs-fulda.de Process 2 of 3 running on sunpc1 ... tyr small_prog 142 mpiexec -np 3 -host tyr,sunpc4,sunpc1 hostname sunpc1 sunpc4.informatik.hs-fulda.de sunpc4.informatik.hs-fulda.de The command breaks if I add a Linux machine. tyr java 110 mpiexec -np 3 -host tyr,sunpc4,linpc4 \ java -cp $HOME/mpi_classfiles HelloMainWithBarrier -------------------------------------------------------------------------- It looks like opal_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during opal_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): mca_base_open failed --> Returned value -2 instead of OPAL_SUCCESS -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_init failed --> Returned value Out of resource (-2) instead of ORTE_SUCCESS -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ompi_mpi_init: orte_init failed --> Returned "Out of resource" (-2) instead of "Success" (0) -------------------------------------------------------------------------- *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, *** and potentially your MPI job) [linpc4:27369] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! ------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code.. Per user-direction, the job has been aborted. ------------------------------------------------------- -------------------------------------------------------------------------- mpiexec detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was: Process name: [[21095,1],2] Exit code: 1 -------------------------------------------------------------------------- tyr java 111 which mpijavac /usr/local/openmpi-1.9_32_cc/bin/mpijavac tyr java 112 more /usr/local/openmpi-1.9_32_cc/bin/mpijavac #!/usr/bin/env perl # WARNING: DO NOT EDIT THE mpijava.pl FILE AS IT IS GENERATED! # MAKE ALL CHANGES IN mpijava.pl.in # Copyright (c) 2011 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2012 Oracle and/or its affiliates. All rights reserved. use strict; # The main purpose of this wrapper compiler is to check for # and adjust the Java class path to include the OMPI classes # in mpi.jar. The user may have specified a class path on # our cmd line, or it may be in the environment, so we have # to check for both. We also need to be careful not to # just override the class path as it probably includes classes # they need for their application! It also may already include # the path to mpi.jar, and while it doesn't hurt anything, we # don't want to include our class path more than once to avoid # user astonishment # Let the build system provide us with some critical values my $my_compiler = "/usr/local/jdk1.7.0_07/bin/javac"; my $ompi_classpath = "/usr/local/openmpi-1.9_32_cc/lib/mpi.jar"; # globals my $showme_arg = 0; my $verbose = 0; my $my_arg; ... All libraries are available. tyr java 113 ldd /usr/local/jdk1.7.0_07/bin/javac libthread.so.1 => /usr/lib/libthread.so.1 libjli.so => /export2/prog/SunOS_sparc/jdk1.7.0_07/bin/../jre/lib/sparc/jli/libjli.so libdl.so.1 => /usr/lib/libdl.so.1 libc.so.1 => /usr/lib/libc.so.1 libm.so.2 => /usr/lib/libm.so.2 /platform/SUNW,A70/lib/libc_psr.so.1 tyr java 114 ssh sunpc4 ldd /usr/local/jdk1.7.0_07/bin/javac libthread.so.1 => /usr/lib/libthread.so.1 libjli.so => /usr/local/jdk1.7.0_07/bin/../jre/lib/i386/jli/libjli.so libdl.so.1 => /usr/lib/libdl.so.1 libc.so.1 => /usr/lib/libc.so.1 libm.so.2 => /usr/lib/libm.so.2 tyr java 115 ssh linpc4 ldd /usr/local/jdk1.7.0_07/bin/javac linux-gate.so.1 => (0xffffe000) libpthread.so.0 => /lib/libpthread.so.0 (0xf77b2000) libjli.so => /usr/local/jdk1.7.0_07/bin/../jre/lib/i386/jli/libjli.so (0xf779d000) libdl.so.2 => /lib/libdl.so.2 (0xf7798000) libc.so.6 => /lib/libc.so.6 (0xf762b000) /lib/ld-linux.so.2 (0xf77ce000) I don't have any errors in the log files except the error for nfs. tyr openmpi-1.9-Linux.x86_64.32_cc 136 ls log.* log.configure.Linux.x86_64.32_cc log.make-install.Linux.x86_64.32_cc log.make-check.Linux.x86_64.32_cc log.make.Linux.x86_64.32_cc tyr openmpi-1.9-Linux.x86_64.32_cc 137 grep "Error 1" log.* log.make-check.Linux.x86_64.32_cc:make[3]: *** [check-TESTS] Error 1 log.make-check.Linux.x86_64.32_cc:make[1]: *** [check-recursive] Error 1 log.make-check.Linux.x86_64.32_cc:make: *** [check-recursive] Error 1 ... SUPPORT: OMPI Test failed: opal_path_nfs() (1 of 32 failed) FAIL: opal_path_nfs ======================================================== 1 of 2 tests failed Please report to http://www.open-mpi.org/community/help/ ======================================================== make[3]: *** [check-TESTS] Error 1 ... It doesn't help to build the class files on Linux (which should be independent of the architecture anyway). tyr java 131 ssh linpc4 linpc4 fd1026 98 cd .../prog/mpi/java linpc4 java 99 make clean rm -f /home/fd1026/mpi_classfiles/HelloMainWithBarrier.class \ /home/fd1026/mpi_classfiles/HelloMainWithoutBarrier.class linpc4 java 100 make mpijavac -d /home/fd1026/mpi_classfiles HelloMainWithBarrier.java mpijavac -d /home/fd1026/mpi_classfiles HelloMainWithoutBarrier.java linpc4 java 101 mpiexec -np 3 -host linpc4 \ java -cp $HOME/mpi_classfiles HelloMainWithBarrier -------------------------------------------------------------------------- It looks like opal_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during opal_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): mca_base_open failed --> Returned value -2 instead of OPAL_SUCCESS ... Has anybody else this problem as well? Do you know a solution? Thank you very much for any help in advance. Kind regards Siegmar