Hi, > I'm on the road the rest of this week, but can look at this when I return > next week. It looks like something unrelated to the Java bindings failed to > properly initialize - at a guess, I'd suspect that you are missing the > LD_LIBRARY_PATH setting so none of the OMPI libs were found.
Perhaps the output of my environment program is helpful in that case. I attached my environment. mpiexec -np 4 -host linpc4,sunpc4,rs0 environ_mpi \ >& env_linpc_sunpc_sparc.txt Thank you very much for your help in advance. Kind regards Siegmar > On Wed, Sep 26, 2012 at 5:42 AM, Siegmar Gross < > siegmar.gr...@informatik.hs-fulda.de> wrote: > > > Hi, > > > > yesterday I installed openmpi-1.9a1r27362 on Solaris and Linux and > > I have a problem with mpiJava on Linux (openSUSE-Linux 12.1, x86_64). > > > > > > linpc4 mpi_classfiles 104 javac HelloMainWithoutMPI.java > > linpc4 mpi_classfiles 105 mpijavac HelloMainWithBarrier.java > > linpc4 mpi_classfiles 106 mpijavac -showme > > /usr/local/jdk1.7.0_07-64/bin/javac \ > > -cp ...:.:/usr/local/openmpi-1.9_64_cc/lib64/mpi.jar > > > > > > It works with Java without MPI. > > > > linpc4 mpi_classfiles 107 mpiexec java -cp $HOME/mpi_classfiles \ > > HelloMainWithoutMPI > > Hello from linpc4.informatik.hs-fulda.de/193.174.26.225 > > > > > > It breaks with Java and MPI. > > > > linpc4 mpi_classfiles 108 mpiexec java -cp $HOME/mpi_classfiles \ > > HelloMainWithBarrier > > -------------------------------------------------------------------------- > > It looks like opal_init failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during opal_init; some of which are due to configuration or > > environment problems. This failure appears to be an internal failure; > > here's some additional information (which may only be relevant to an > > Open MPI developer): > > > > mca_base_open failed > > --> Returned value -2 instead of OPAL_SUCCESS > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > It looks like orte_init failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during orte_init; some of which are due to configuration or > > environment problems. This failure appears to be an internal failure; > > here's some additional information (which may only be relevant to an > > Open MPI developer): > > > > opal_init failed > > --> Returned value Out of resource (-2) instead of ORTE_SUCCESS > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > It looks like MPI_INIT failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during MPI_INIT; some of which are due to configuration or environment > > problems. This failure appears to be an internal failure; here's some > > additional information (which may only be relevant to an Open MPI > > developer): > > > > ompi_mpi_init: orte_init failed > > --> Returned "Out of resource" (-2) instead of "Success" (0) > > -------------------------------------------------------------------------- > > *** An error occurred in MPI_Init > > *** on a NULL communicator > > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, > > *** and potentially your MPI job) > > [linpc4:15332] Local abort before MPI_INIT completed successfully; not > > able to > > aggregate error messages, and not able to guarantee that all other > > processes were > > killed! > > ------------------------------------------------------- > > Primary job terminated normally, but 1 process returned > > a non-zero exit code.. Per user-direction, the job has been aborted. > > ------------------------------------------------------- > > -------------------------------------------------------------------------- > > mpiexec detected that one or more processes exited with non-zero status, > > thus > > causing > > the job to be terminated. The first process to do so was: > > > > Process name: [[58875,1],0] > > Exit code: 1 > > -------------------------------------------------------------------------- > > > > > > I configured with the following command. > > > > ../openmpi-1.9a1r27362/configure --prefix=/usr/local/openmpi-1.9_64_cc \ > > --libdir=/usr/local/openmpi-1.9_64_cc/lib64 \ > > --with-jdk-bindir=/usr/local/jdk1.7.0_07-64/bin \ > > --with-jdk-headers=/usr/local/jdk1.7.0_07-64/include \ > > JAVA_HOME=/usr/local/jdk1.7.0_07-64 \ > > LDFLAGS="-m64" \ > > CC="cc" CXX="CC" FC="f95" \ > > CFLAGS="-m64" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" \ > > CPP="cpp" CXXCPP="cpp" \ > > CPPFLAGS="" CXXCPPFLAGS="" \ > > C_INCL_PATH="" C_INCLUDE_PATH="" CPLUS_INCLUDE_PATH="" \ > > OBJC_INCLUDE_PATH="" OPENMPI_HOME="" \ > > --enable-cxx-exceptions \ > > --enable-mpi-java \ > > --enable-heterogeneous \ > > --enable-opal-multi-threads \ > > --enable-mpi-thread-multiple \ > > --with-threads=posix \ > > --with-hwloc=internal \ > > --without-verbs \ > > --without-udapl \ > > --with-wrapper-cflags=-m64 \ > > --enable-debug \ > > |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc > > > > > > It works fine on Solaris machines as long as the hosts belong to the > > same kind (Sparc or x86_64). > > > > tyr mpi_classfiles 194 mpiexec -host sunpc0,sunpc1,sunpc4 \ > > java -cp $HOME/mpi_classfiles HelloMainWithBarrier > > Process 1 of 3 running on sunpc1 > > Process 2 of 3 running on sunpc4.informatik.hs-fulda.de > > Process 0 of 3 running on sunpc0 > > > > sunpc4 fd1026 107 mpiexec -host tyr,rs0,rs1 \ > > java -cp $HOME/mpi_classfiles HelloMainWithBarrier > > Process 1 of 3 running on rs0.informatik.hs-fulda.de > > Process 2 of 3 running on rs1.informatik.hs-fulda.de > > Process 0 of 3 running on tyr.informatik.hs-fulda.de > > > > > > It breaks if the hosts belong to both kinds of machines. > > > > sunpc4 fd1026 106 mpiexec -host tyr,rs0,sunpc1 \ > > java -cp $HOME/mpi_classfiles HelloMainWithBarrier > > [rs0.informatik.hs-fulda.de:7718] *** An error occurred in MPI_Comm_dup > > [rs0.informatik.hs-fulda.de:7718] *** reported by process [565116929,1] > > [rs0.informatik.hs-fulda.de:7718] *** on communicator MPI_COMM_WORLD > > [rs0.informatik.hs-fulda.de:7718] *** MPI_ERR_INTERN: internal error > > [rs0.informatik.hs-fulda.de:7718] *** MPI_ERRORS_ARE_FATAL (processes > > in this communicator will now abort, > > [rs0.informatik.hs-fulda.de:7718] *** and potentially your MPI job) > > [sunpc4.informatik.hs-fulda.de:07900] 1 more process has sent help > > message help-mpi-errors.txt / mpi_errors_are_fatal > > [sunpc4.informatik.hs-fulda.de:07900] Set MCA parameter > > "orte_base_help_aggregate" to 0 to see all help / error messages > > > > > > Please let me know if I can provide anything else to track these errors. > > Thank you very much for any help in advance. > > > > > > Kind regards > > > > Siegmar > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > >
[sunpc4.informatik.hs-fulda.de][[4083,1],2][../../../../../openmpi-1.9a1r27362/ompi/mca/btl/sctp/btl_sctp_proc.c:143:mca_btl_sctp_proc_create] mca_base_modex_recv: failed with return value=-13 [rs0.informatik.hs-fulda.de][[4083,1],3][../../../../../openmpi-1.9a1r27362/ompi/mca/btl/sctp/btl_sctp_proc.c:143:mca_btl_sctp_proc_create] mca_base_modex_recv: failed with return value=-13 [rs0.informatik.hs-fulda.de][[4083,1],3][../../../../../openmpi-1.9a1r27362/ompi/mca/btl/sctp/btl_sctp_proc.c:143:mca_btl_sctp_proc_create] mca_base_modex_recv: failed with return value=-13 [rs0.informatik.hs-fulda.de][[4083,1],3][../../../../../openmpi-1.9a1r27362/ompi/mca/btl/sctp/btl_sctp_proc.c:143:mca_btl_sctp_proc_create] mca_base_modex_recv: failed with return value=-13 Now 3 slave tasks are sending their environment. Environment from task 1: message type: 3 msg length: 3911 characters message: hostname: linpc4 operating system: Linux release: 3.1.9-1.4-desktop processor: x86_64 PATH /usr/local/eclipse-3.6.1 /usr/local/NetBeans-4.0/bin /usr/local/jdk1.7.0_07-64/bin /usr/local/apache-ant-1.6.2/bin /usr/local/icc-9.1/idb/bin /usr/local/icc-9.1/cc/bin /usr/local/icc-9.1/fc/bin /usr/local/gcc-4.7.1/bin /opt/solstudio12.3/bin /usr/local/bin /usr/local/ssl/bin /usr/local/pgsql/bin /bin /usr/bin /usr/X11R6/bin /usr/local/teTeX-1.0.7/bin/i586-pc-linux-gnu /usr/local/bluej-2.1.2 /usr/local/openmpi-1.9_64_cc/bin /home/fd1026/Linux/x86_64/bin . /usr/sbin LD_LIBRARY_PATH_32 /usr/lib /usr/local/jdk1.7.0_07-64/jre/lib/i386 /usr/local/gcc-4.7.1/lib /usr/local/gcc-4.7.1/libexec/gcc/x86_64-unknown-linux-gnu/4.7.1/32 /usr/local/gcc-4.7.1/lib/gcc/x86_64-unknown-linux-gnu/4.7.1/32 /usr/local/lib /usr/local/ssl/lib /lib /usr/lib /usr/X11R6/lib /usr/local/openmpi-1.9_64_cc/lib /home/fd1026/Linux/x86_64/lib LD_LIBRARY_PATH_64 /usr/lib64 /usr/local/jdk1.7.0_07-64/jre/lib/amd64 /usr/local/gcc-4.7.1/lib64 /usr/local/gcc-4.7.1/libexec/gcc/x86_64-unknown-linux-gnu/4.7.1 /usr/local/gcc-4.7.1/lib/gcc/x86_64-unknown-linux-gnu/4.7.1 /usr/local/lib64 /usr/local/ssl/lib64 /usr/lib64 /usr/X11R6/lib64 /usr/local/openmpi-1.9_64_cc/lib64 /home/fd1026/Linux/x86_64/lib64 LD_LIBRARY_PATH /usr/lib /usr/local/jdk1.7.0_07-64/jre/lib/i386 /usr/local/gcc-4.7.1/lib /usr/local/gcc-4.7.1/libexec/gcc/x86_64-unknown-linux-gnu/4.7.1/32 /usr/local/gcc-4.7.1/lib/gcc/x86_64-unknown-linux-gnu/4.7.1/32 /usr/local/lib /usr/local/ssl/lib /lib /usr/lib /usr/X11R6/lib /usr/local/openmpi-1.9_64_cc/lib /usr/lib64 /usr/local/jdk1.7.0_07-64/jre/lib/amd64 /usr/local/gcc-4.7.1/lib64 /usr/local/gcc-4.7.1/libexec/gcc/x86_64-unknown-linux-gnu/4.7.1 /usr/local/gcc-4.7.1/lib/gcc/x86_64-unknown-linux-gnu/4.7.1 /usr/local/lib64 /usr/local/ssl/lib64 /usr/lib64 /usr/X11R6/lib64 /usr/local/openmpi-1.9_64_cc/lib64 /home/fd1026/Linux/x86_64/lib64 CLASSPATH /usr/local/junit4.10 /usr/local/junit4.10/junit-4.10.jar //usr/local/jdk1.7.0_07-64/j3d/lib/ext/j3dcore.jar //usr/local/jdk1.7.0_07-64/j3d/lib/ext/j3dutils.jar //usr/local/jdk1.7.0_07-64/j3d/lib/ext/vecmath.jar /usr/local/javacc-5.0/javacc.jar . Environment from task 2: message type: 3 msg length: 4196 characters message: hostname: sunpc4.informatik.hs-fulda.de operating system: SunOS release: 5.10 processor: i86pc PATH /usr/local/eclipse-3.6.1 /usr/local/NetBeans-4.0/bin /usr/local/jdk1.7.0_07/bin/amd64 /usr/local/apache-ant-1.6.2/bin /usr/local/gcc-4.7.1/bin /opt/solstudio12.3/bin /usr/local/bin /usr/local/ssl/bin /usr/local/pgsql/bin /usr/bin /usr/openwin/bin /usr/dt/bin /usr/ccs/bin /usr/sfw/bin /opt/sfw/bin /usr/ucb /usr/lib/lp/postscript /usr/local/teTeX-1.0.7/bin/i386-pc-solaris2.10 /usr/local/bluej-2.1.2 /usr/local/openmpi-1.9_64_cc/bin /home/fd1026/SunOS/x86_64/bin . /usr/sbin LD_LIBRARY_PATH_32 /usr/lib /usr/local/jdk1.7.0_07/jre/lib/i386 /usr/local/gcc-4.7.1/lib /usr/local/gcc-4.7.1/lib/gcc/i386-pc-solaris2.10/4.7.1 /usr/local/lib /usr/local/ssl/lib /usr/local/oracle /usr/local/pgsql/lib /usr/lib /usr/openwin/lib /usr/openwin/server/lib /usr/dt/lib /usr/X11R6/lib /usr/ccs/lib /usr/sfw/lib /opt/sfw/lib /usr/ucblib /usr/local/openmpi-1.9_64_cc/lib /home/fd1026/SunOS/x86_64/lib LD_LIBRARY_PATH_64 /usr/lib/amd64 /usr/local/jdk1.7.0_07/jre/lib/amd64 /usr/local/gcc-4.7.1/lib/amd64 /usr/local/gcc-4.7.1/lib/gcc/i386-pc-solaris2.10/4.7.1/amd64 /usr/local/lib/amd64 /usr/local/ssl/lib/amd64 /usr/local/lib64 /usr/lib/amd64 /usr/openwin/lib/amd64 /usr/openwin/server/lib/amd64 /usr/dt/lib/amd64 /usr/X11R6/lib/amd64 /usr/ccs/lib/amd64 /usr/sfw/lib/amd64 /opt/sfw/lib/amd64 /usr/ucblib/amd64 /usr/local/openmpi-1.9_64_cc/lib64 /home/fd1026/SunOS/x86_64/lib64 LD_LIBRARY_PATH /usr/lib/amd64 /usr/local/jdk1.7.0_07/jre/lib/amd64 /usr/local/gcc-4.7.1/lib/amd64 /usr/local/gcc-4.7.1/lib/gcc/i386-pc-solaris2.10/4.7.1/amd64 /usr/local/lib/amd64 /usr/local/ssl/lib/amd64 /usr/local/lib64 /usr/lib/amd64 /usr/openwin/lib/amd64 /usr/openwin/server/lib/amd64 /usr/dt/lib/amd64 /usr/X11R6/lib/amd64 /usr/ccs/lib/amd64 /usr/sfw/lib/amd64 /opt/sfw/lib/amd64 /usr/ucblib/amd64 /usr/local/openmpi-1.9_64_cc/lib64 /home/fd1026/SunOS/x86_64/lib64 CLASSPATH /usr/local/junit4.10 /usr/local/junit4.10/junit-4.10.jar //usr/local/jdk1.7.0_07/j3d/lib/ext/j3dcore.jar //usr/local/jdk1.7.0_07/j3d/lib/ext/j3dutils.jar //usr/local/jdk1.7.0_07/j3d/lib/ext/vecmath.jar /usr/local/javacc-5.0/javacc.jar . Environment from task 3: message type: 3 msg length: 4394 characters message: hostname: rs0.informatik.hs-fulda.de operating system: SunOS release: 5.10 processor: sun4u PATH /usr/local/eclipse-3.6.1 /usr/local/NetBeans-4.0/bin /usr/local/jdk1.7.0_07/bin/sparcv9 /usr/local/apache-ant-1.6.2/bin /usr/local/gcc-4.7.1/bin /opt/solstudio12.3/bin /usr/local/bin /usr/local/ssl/bin /usr/local/pgsql/bin /usr/bin /usr/openwin/bin /usr/dt/bin /usr/ccs/bin /usr/sfw/bin /opt/sfw/bin /usr/ucb /usr/xpg4/bin /usr/local/teTeX-1.0.7/bin/sparc-sun-solaris2.10 /usr/local/bluej-2.1.2 /usr/local/openmpi-1.9_64_cc/bin /home/fd1026/SunOS/sparc/bin . /usr/sbin LD_LIBRARY_PATH_32 /usr/lib /usr/local/jdk1.7.0_07/jre/lib/sparc /usr/local/gcc-4.7.1/lib /usr/local/gcc-4.7.1/lib/gcc/sparc-sun-solaris2.10/4.7.1 /usr/local/lib /usr/local/ssl/lib /usr/local/oracle /usr/local/pgsql/lib /lib /usr/lib /usr/openwin/lib /usr/dt/lib /usr/X11R6/lib /usr/ccs/lib /usr/sfw/lib /opt/sfw/lib /usr/ucblib /usr/local/openmpi-1.9_64_cc/lib /home/fd1026/SunOS/sparc/lib LD_LIBRARY_PATH_64 /usr/lib/sparcv9 /usr/local/jdk1.7.0_07/jre/lib/sparcv9 /usr/local/gcc-4.7.1/lib/sparcv9 /usr/local/gcc-4.7.1/lib/gcc/sparc-sun-solaris2.10/4.7.1/sparcv9 /usr/local/lib/sparcv9 /usr/local/ssl/lib/sparcv9 /usr/local/lib64 /usr/local/oracle/sparcv9 /usr/local/pgsql/lib/sparcv9 /lib/sparcv9 /usr/lib/sparcv9 /usr/openwin/lib/sparcv9 /usr/dt/lib/sparcv9 /usr/X11R6/lib/sparcv9 /usr/ccs/lib/sparcv9 /usr/sfw/lib/sparcv9 /opt/sfw/lib/sparcv9 /usr/ucblib/sparcv9 /usr/local/openmpi-1.9_64_cc/lib64 /home/fd1026/SunOS/sparc/lib64 LD_LIBRARY_PATH /usr/lib/sparcv9 /usr/local/jdk1.7.0_07/jre/lib/sparcv9 /usr/local/gcc-4.7.1/lib/sparcv9 /usr/local/gcc-4.7.1/lib/gcc/sparc-sun-solaris2.10/4.7.1/sparcv9 /usr/local/lib/sparcv9 /usr/local/ssl/lib/sparcv9 /usr/local/lib64 /usr/local/oracle/sparcv9 /usr/local/pgsql/lib/sparcv9 /lib/sparcv9 /usr/lib/sparcv9 /usr/openwin/lib/sparcv9 /usr/dt/lib/sparcv9 /usr/X11R6/lib/sparcv9 /usr/ccs/lib/sparcv9 /usr/sfw/lib/sparcv9 /opt/sfw/lib/sparcv9 /usr/ucblib/sparcv9 /usr/local/openmpi-1.9_64_cc/lib64 /home/fd1026/SunOS/sparc/lib CLASSPATH /usr/local/junit4.10 /usr/local/junit4.10/junit-4.10.jar //usr/local/jdk1.7.0_07/j3d/lib/ext/j3dcore.jar //usr/local/jdk1.7.0_07/j3d/lib/ext/j3dutils.jar //usr/local/jdk1.7.0_07/j3d/lib/ext/vecmath.jar /usr/local/javacc-5.0/javacc.jar .