Hi, > Does the behavior only occur with Java applications, as your subject > implies? I thought this was a more general behavior based on prior notes?
It is a general problem as you can see in the older email below. I didn't change the header because I detected this behaviour when I tried out mpiJava. > As I said back then, I have no earthly idea why your local machine is being > ignored, and I cannot replicate that behavior on any system available to me. > > What you might try is adding --display-allocation --display-devel-map to > your cmd line and see what the system thinks it is doing. The first option > will display what nodes and slots it thinks are available to it, and the > second will report where it thinks it placed everything. tyr topo 244 mpiexec -np 3 -host tyr,sunpc4,linpc4 --display-allocation \ --display-devel-map hostname ====================== ALLOCATED NODES ====================== Data for node: tyr Launch id: -1 State: 2 Daemon: [[3909,0],0] Daemon launched: True Num slots: 1 Slots in use: 0 Oversubscribed: FALSE Num slots allocated: 1 Max slots: 0 Username on node: NULL Num procs: 0 Next node_rank: 0 Data for node: sunpc4 Launch id: -1 State: 2 Daemon: [[3909,0],1] Daemon launched: False Num slots: 1 Slots in use: 0 Oversubscribed: FALSE Num slots allocated: 1 Max slots: 0 Username on node: NULL Num procs: 0 Next node_rank: 0 Data for node: linpc4 Launch id: -1 State: 2 Daemon: [[3909,0],2] Daemon launched: False Num slots: 1 Slots in use: 0 Oversubscribed: FALSE Num slots allocated: 1 Max slots: 0 Username on node: NULL Num procs: 0 Next node_rank: 0 ================================================================= Mapper requested: NULL Last mapper: round_robin Mapping policy: BYSLOT Ranking policy: SLOT Binding policy: NONE[NODE] Cpu set: NULL PPR: NULL Num new daemons: 0 New daemon starting vpid INVALID Num nodes: 2 Data for node: sunpc4 Launch id: -1 State: 2 Daemon: [[3909,0],1] Daemon launched: False Num slots: 1 Slots in use: 1 Oversubscribed: TRUE Num slots allocated: 1 Max slots: 0 Username on node: NULL Num procs: 2 Next node_rank: 2 Data for proc: [[3909,1],0] Pid: 0 Local rank: 0 Node rank: 0 App rank: 0 State: INITIALIZED Restarts: 0 App_context: 0 Locale: 0-1 Binding: NULL[0] Data for proc: [[3909,1],1] Pid: 0 Local rank: 1 Node rank: 1 App rank: 1 State: INITIALIZED Restarts: 0 App_context: 0 Locale: 0-1 Binding: NULL[0] Data for node: linpc4 Launch id: -1 State: 2 Daemon: [[3909,0],2] Daemon launched: False Num slots: 1 Slots in use: 1 Oversubscribed: FALSE Num slots allocated: 1 Max slots: 0 Username on node: NULL Num procs: 1 Next node_rank: 1 Data for proc: [[3909,1],2] Pid: 0 Local rank: 0 Node rank: 0 App rank: 2 State: INITIALIZED Restarts: 0 App_context: 0 Locale: 0-1 Binding: NULL[0] linpc4 sunpc4.informatik.hs-fulda.de sunpc4.informatik.hs-fulda.de I get the following output for the command for openmpi-1.6.2. tyr topo 109 mpiexec -np 3 -host tyr,sunpc4,linpc4 \ --display-allocation --display-devel-map hostname ====================== ALLOCATED NODES ====================== Data for node: tyr.informatik.hs-fulda.de Launch id: -1 State: 2 Num boards: 1 Num sockets/board: 0 Num cores/socket: 0 Daemon: [[4018,0],0] Daemon launched: True Num slots: 1 Slots in use: 0 Oversubscribed: FALSE Num slots allocated: 1 Max slots: 0 Username on node: NULL Detected Resources: Num procs: 0 Next node_rank: 0 Data for node: sunpc4 Launch id: -1 State: 2 Num boards: 1 Num sockets/board: 0 Num cores/socket: 0 Daemon: Not defined Daemon launched: False Num slots: 1 Slots in use: 0 Oversubscribed: FALSE Num slots allocated: 1 Max slots: 0 Username on node: NULL Detected Resources: Num procs: 0 Next node_rank: 0 Data for node: linpc4 Launch id: -1 State: 2 Num boards: 1 Num sockets/board: 0 Num cores/socket: 0 Daemon: Not defined Daemon launched: False Num slots: 1 Slots in use: 0 Oversubscribed: FALSE Num slots allocated: 1 Max slots: 0 Username on node: NULL Detected Resources: Num procs: 0 Next node_rank: 0 ================================================================= Map generated by mapping policy: 0400 Npernode: 0 Oversubscribe allowed: TRUE CPU Lists: FALSE Num new daemons: 2 New daemon starting vpid 1 Num nodes: 3 Data for node: tyr.informatik.hs-fulda.de Launch id: -1 State: 2 Num boards: 1 Num sockets/board: 0 Num cores/socket: 0 Daemon: [[4018,0],0] Daemon launched: True Num slots: 1 Slots in use: 1 Oversubscribed: FALSE Num slots allocated: 1 Max slots: 0 Username on node: NULL Detected Resources: Num procs: 1 Next node_rank: 1 Data for proc: [[4018,1],0] Pid: 0 Local rank: 0 Node rank: 0 State: 0 Restarts: 0 App_context: 0 Slot list: NULL Data for node: sunpc4 Launch id: -1 State: 2 Num boards: 1 Num sockets/board: 0 Num cores/socket: 0 Daemon: [[4018,0],1] Daemon launched: False Num slots: 1 Slots in use: 1 Oversubscribed: FALSE Num slots allocated: 1 Max slots: 0 Username on node: NULL Detected Resources: Num procs: 1 Next node_rank: 1 Data for proc: [[4018,1],1] Pid: 0 Local rank: 0 Node rank: 0 State: 0 Restarts: 0 App_context: 0 Slot list: NULL Data for node: linpc4 Launch id: -1 State: 2 Num boards: 1 Num sockets/board: 0 Num cores/socket: 0 Daemon: [[4018,0],2] Daemon launched: False Num slots: 1 Slots in use: 1 Oversubscribed: FALSE Num slots allocated: 1 Max slots: 0 Username on node: NULL Detected Resources: Num procs: 1 Next node_rank: 1 Data for proc: [[4018,1],2] Pid: 0 Local rank: 0 Node rank: 0 State: 0 Restarts: 0 App_context: 0 Slot list: NULL linpc4 sunpc4.informatik.hs-fulda.de tyr.informatik.hs-fulda.de Is the above output helpful? Thank you very much for any help in advance. Kind regards Siegmar > On Wed, Sep 26, 2012 at 4:53 AM, Siegmar Gross < > siegmar.gr...@informatik.hs-fulda.de> wrote: > > > Hi, > > > > yesterday I have installed openmpi-1.9a1r27362 and I still have a > > problem with "-host". My local machine will not be used, if I try > > to start processes on three hosts. > > > > tyr: Solaris 10, Sparc > > sunpc4: Solaris 10 , x86_64 > > linpc4: openSUSE-Linux 12.1, x86_64 > > > > > > tyr mpi_classfiles 175 javac HelloMainWithoutMPI.java > > tyr mpi_classfiles 176 mpiexec -np 3 -host tyr,sunpc4,linpc4 \ > > java -cp $HOME/mpi_classfiles HelloMainWithoutMPI > > Hello from linpc4.informatik.hs-fulda.de/193.174.26.225 > > Hello from sunpc4.informatik.hs-fulda.de/193.174.26.224 > > Hello from sunpc4.informatik.hs-fulda.de/193.174.26.224 > > tyr mpi_classfiles 177 which mpiexec > > /usr/local/openmpi-1.9_64_cc/bin/mpiexec > > > > > > Everything works fine with openmpi-1.6.2rc5r27346. > > > > tyr mpi_classfiles 108 javac HelloMainWithoutMPI.java > > tyr mpi_classfiles 109 mpiexec -np 3 -host tyr,sunpc4,linpc4 \ > > java -cp $HOME/mpi_classfiles HelloMainWithoutMPI > > Hello from linpc4.informatik.hs-fulda.de/193.174.26.225 > > Hello from sunpc4.informatik.hs-fulda.de/193.174.26.224 > > Hello from tyr.informatik.hs-fulda.de/193.174.24.39 > > tyr mpi_classfiles 110 which mpiexec > > /usr/local/openmpi-1.6.2_64_cc/bin/mpiexec > > > > > > In my opinion it is a problem with openmpi-1.9. I used the following > > configure command for Sparc. The commands for the other platforms are > > similar. > > > > ../openmpi-1.9a1r27362/configure --prefix=/usr/local/openmpi-1.9_64_cc \ > > --libdir=/usr/local/openmpi-1.9_64_cc/lib64 \ > > --with-jdk-bindir=/usr/local/jdk1.7.0_07/bin/sparcv9 \ > > --with-jdk-headers=/usr/local/jdk1.7.0_07/include \ > > JAVA_HOME=/usr/local/jdk1.7.0_07 \ > > LDFLAGS="-m64" \ > > CC="cc" CXX="CC" FC="f95" \ > > CFLAGS="-m64" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" \ > > CPP="cpp" CXXCPP="cpp" \ > > CPPFLAGS="" CXXCPPFLAGS="" \ > > C_INCL_PATH="" C_INCLUDE_PATH="" CPLUS_INCLUDE_PATH="" \ > > OBJC_INCLUDE_PATH="" OPENMPI_HOME="" \ > > --enable-cxx-exceptions \ > > --enable-mpi-java \ > > --enable-heterogeneous \ > > --enable-opal-multi-threads \ > > --enable-mpi-thread-multiple \ > > --with-threads=posix \ > > --with-hwloc=internal \ > > --without-verbs \ > > --without-udapl \ > > --with-wrapper-cflags=-m64 \ > > --enable-debug \ > > |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc > > > > Can I provide anything to track the problem? Thank you very much for > > any help in advance. > > > > > > Kind regards > > > > Siegmar > > > > > > > > > >>> I tried mpiJava on a 32-bit installation of openmpi-1.9a1r27361. > > > >>> Why doesn't "mpiexec" start a process on my local machine (it > > > >>> is not a matter of Java, because I have the same behaviour when > > > >>> I use "hostname")? > > > >>> > > > >>> tyr java 133 mpiexec -np 3 -host tyr,sunpc4,sunpc1 \ > > > >>> java -cp $HOME/mpi_classfiles HelloMainWithBarrier > > > >>> Process 0 of 3 running on sunpc4.informatik.hs-fulda.de > > > >>> Process 1 of 3 running on sunpc4.informatik.hs-fulda.de > > > >>> Process 2 of 3 running on sunpc1 > > > >>> ... > > > >>> > > > >>> tyr small_prog 142 mpiexec -np 3 -host tyr,sunpc4,sunpc1 hostname > > > >>> sunpc1 > > > >>> sunpc4.informatik.hs-fulda.de > > > >>> sunpc4.informatik.hs-fulda.de > > > >>> > > > >> > > > >> No idea - it works fine for me. Do you have an environmental > > > >> variable, or something in your default MCA param file, that > > > >> indicates "no_use_local"? > > > > > > > > I have only built and installed Open MPI and I have no param file. > > > > I don't have a mca environment variable. > > > > > > > > tyr hello_1 136 grep local \ > > > > /usr/local/openmpi-1.9_64_cc/etc/openmpi-mca-params.conf > > > > # $sysconf is a directory on a local disk, it is likely that changes > > > > # component_path = /usr/local/lib/openmpi:~/my_openmpi_components > > > > > > > > tyr hello_1 143 env | grep -i mca > > > > tyr hello_1 144 > > > > > > No ideas - I can't make it behave that way :-( > > > > > > > > > > > > > > >>> The command breaks if I add a Linux machine. > > > >> > > > >> Check to ensure that the path and ld_library_path on your linux box > > > >> is being correctly set to point to the corresponding Linux OMPI libs. > > > >> It looks like that isn't the case. Remember, the Java bindings are > > > >> just that - they are bindings that wrap on top of the regular C > > > >> code. Thus, the underlying OMPI system remains system-dependent, > > > >> and you must have the appropriate native libraries installed on > > > >> each machine. > > > > > > > > I implemented a small program, which shows these values and they > > > > are wrong for MPI, but I have no idea why. The two entries at the > > > > beginning from PATH and LD_LIBRARY_PATH are not from our normal > > > > environment, because I add these values at the end of the environment > > > > variables PATH, LD_LIBRARY_PATH_32, and LD_LIBRARY_PATH_64. Afterwards > > > > I set LD_LIBRARY_PATH to LD_LIBRARY_PATH_64 on a 64-bit Solaris > > > > machine, to LD_LIBRARY_PATH_32 followed by LD_LIBRARY_PATH_64 on a > > > > 64-bit Linux machine, and to LD_LIBRARY_PATH_32 on every 32-bit > > > > machine. > > > > > > > > > > I see the problem - our heterogeneous support could use some > > improvement, but > > it'll be awhile before I can get to it. > > > > > > What's happening is that we are picking up and propagating the prefix you > > specified, prepending it to your path and ld_library_path. Did you by > > chance > > configure with --enable-orterun-prefix-by-default? Or specify --prefix on > > your > > cmd line? Otherwise, it shouldn't be doing this. For this purpose, you > > cannot > > use either of those options. > > > > > > Also, you'll need to add --enable-heterogeneous to your configure so the > > MPI > > layer builds the right support, and add --hetero-nodes to your cmd line. > > > > > > > > > > > > > > Now 1 slave tasks are sending their environment. > > > > > > > > Environment from task 1: > > > > message type: 3 > > > > msg length: 4622 characters > > > > message: > > > > hostname: tyr.informatik.hs-fulda.de > > > > operating system: SunOS > > > > release: 5.10 > > > > processor: sun4u > > > > PATH > > > > /usr/local/openmpi-1.9_64_cc/bin (!!!) > > > > /usr/local/openmpi-1.9_64_cc/bin (!!!) > > > > /usr/local/eclipse-3.6.1 > > > > ... > > > > /usr/local/openmpi-1.9_64_cc/bin (<- from our > > environment) > > > > LD_LIBRARY_PATH_32 > > > > /usr/lib > > > > /usr/local/jdk1.7.0_07/jre/lib/sparc > > > > ... > > > > /usr/local/openmpi-1.9_64_cc/lib (<- from our > > environment) > > > > LD_LIBRARY_PATH_64 > > > > /usr/lib/sparcv9 > > > > /usr/local/jdk1.7.0_07/jre/lib/sparcv9 > > > > ... > > > > /usr/local/openmpi-1.9_64_cc/lib64 (<- from our > > environment) > > > > LD_LIBRARY_PATH > > > > /usr/local/openmpi-1.9_64_cc/lib (!!!) > > > > /usr/local/openmpi-1.9_64_cc/lib64 (!!!) > > > > /usr/lib/sparcv9 > > > > /usr/local/jdk1.7.0_07/jre/lib/sparcv9 > > > > ... > > > > /usr/local/openmpi-1.9_64_cc/lib64 (<- from our > > environment) > > > > CLASSPATH > > > > /usr/local/junit4.10 > > > > /usr/local/junit4.10/junit-4.10.jar > > > > //usr/local/jdk1.7.0_07/j3d/lib/ext/j3dcore.jar > > > > //usr/local/jdk1.7.0_07/j3d/lib/ext/j3dutils.jar > > > > //usr/local/jdk1.7.0_07/j3d/lib/ext/vecmath.jar > > > > /usr/local/javacc-5.0/javacc.jar > > > > . > > > > > > > > > > > > Without MPI the program uses our environment. > > > > > > > > tyr hello_1 147 diff env_with* > > > > 1,7c1 > > > > < > > > > < > > > > < Now 1 slave tasks are sending their environment. > > > > < > > > > < Environment from task 1: > > > > < message type: 3 > > > > < msg length: 4622 characters > > > > --- > > > >> Environment: > > > > 14,15d7 > > > > < /usr/local/openmpi-1.9_64_cc/bin > > > > < /usr/local/openmpi-1.9_64_cc/bin > > > > 81,82d72 > > > > < /usr/local/openmpi-1.9_64_cc/lib > > > > < /usr/local/openmpi-1.9_64_cc/lib64 > > > > tyr hello_1 148 > > > > > > > > > > > > I have attached the programs so that you can check yourself and > > > > hopefully get the same results. Do you modify PATH and LD_LIBRARY_PATH? > > > > > > > > > > > > Kind regards > > > > > > > > Siegmar > > > > > > > > > > > > > > > > > > > > > > > >>> tyr java 110 mpiexec -np 3 -host tyr,sunpc4,linpc4 \ > > > >>> java -cp $HOME/mpi_classfiles HelloMainWithBarrier > > > >>> > > -------------------------------------------------------------------------- > > > >>> It looks like opal_init failed for some reason; your parallel > > process is > > > >>> likely to abort. There are many reasons that a parallel process can > > > >>> fail during opal_init; some of which are due to configuration or > > > >>> environment problems. This failure appears to be an internal > > failure; > > > >>> here's some additional information (which may only be relevant to an > > > >>> Open MPI developer): > > > >>> > > > >>> mca_base_open failed > > > >>> --> Returned value -2 instead of OPAL_SUCCESS > > > >>> > > -------------------------------------------------------------------------- > > > >>> > > -------------------------------------------------------------------------- > > > >>> It looks like orte_init failed for some reason; your parallel > > process is > > > >>> likely to abort. There are many reasons that a parallel process can > > > >>> fail during orte_init; some of which are due to configuration or > > > >>> environment problems. This failure appears to be an internal > > failure; > > > >>> here's some additional information (which may only be relevant to an > > > >>> Open MPI developer): > > > >>> > > > >>> opal_init failed > > > >>> --> Returned value Out of resource (-2) instead of ORTE_SUCCESS > > > >>> > > -------------------------------------------------------------------------- > > > >>> > > -------------------------------------------------------------------------- > > > >>> It looks like MPI_INIT failed for some reason; your parallel process > > is > > > >>> likely to abort. There are many reasons that a parallel process can > > > >>> fail during MPI_INIT; some of which are due to configuration or > > environment > > > >>> problems. This failure appears to be an internal failure; here's > > some > > > >>> additional information (which may only be relevant to an Open MPI > > > >>> developer): > > > >>> > > > >>> ompi_mpi_init: orte_init failed > > > >>> --> Returned "Out of resource" (-2) instead of "Success" (0) > > > >>> > > -------------------------------------------------------------------------- > > > >>> *** An error occurred in MPI_Init > > > >>> *** on a NULL communicator > > > >>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now > > abort, > > > >>> *** and potentially your MPI job) > > > >>> [linpc4:27369] Local abort before MPI_INIT completed successfully; > > > >>> not able to aggregate error messages, and not able to guarantee > > > >>> that all other processes were killed! > > > >>> ------------------------------------------------------- > > > >>> Primary job terminated normally, but 1 process returned > > > >>> a non-zero exit code.. Per user-direction, the job has been aborted. > > > >>> ------------------------------------------------------- > > > >>> > > -------------------------------------------------------------------------- > > > >>> mpiexec detected that one or more processes exited with non-zero > > status, > > > >>> thus causing > > > >>> the job to be terminated. The first process to do so was: > > > >>> > > > >>> Process name: [[21095,1],2] > > > >>> Exit code: 1 > > > >>> > > -------------------------------------------------------------------------- > > > >>> > > > >>> > > > >>> tyr java 111 which mpijavac > > > >>> /usr/local/openmpi-1.9_32_cc/bin/mpijavac > > > >>> tyr java 112 more /usr/local/openmpi-1.9_32_cc/bin/mpijavac > > > >>> #!/usr/bin/env perl > > > >>> > > > >>> # WARNING: DO NOT EDIT THE mpijava.pl FILE AS IT IS GENERATED! > > > >>> # MAKE ALL CHANGES IN mpijava.pl.in > > > >>> > > > >>> # Copyright (c) 2011 Cisco Systems, Inc. All rights reserved. > > > >>> # Copyright (c) 2012 Oracle and/or its affiliates. All rights > > reserved. > > > >>> > > > >>> use strict; > > > >>> > > > >>> # The main purpose of this wrapper compiler is to check for > > > >>> # and adjust the Java class path to include the OMPI classes > > > >>> # in mpi.jar. The user may have specified a class path on > > > >>> # our cmd line, or it may be in the environment, so we have > > > >>> # to check for both. We also need to be careful not to > > > >>> # just override the class path as it probably includes classes > > > >>> # they need for their application! It also may already include > > > >>> # the path to mpi.jar, and while it doesn't hurt anything, we > > > >>> # don't want to include our class path more than once to avoid > > > >>> # user astonishment > > > >>> > > > >>> # Let the build system provide us with some critical values > > > >>> my $my_compiler = "/usr/local/jdk1.7.0_07/bin/javac"; > > > >>> my $ompi_classpath = "/usr/local/openmpi-1.9_32_cc/lib/mpi.jar"; > > > >>> > > > >>> # globals > > > >>> my $showme_arg = 0; > > > >>> my $verbose = 0; > > > >>> my $my_arg; > > > >>> ... > > > >>> > > > >>> > > > >>> All libraries are available. > > > >>> > > > >>> tyr java 113 ldd /usr/local/jdk1.7.0_07/bin/javac > > > >>> libthread.so.1 => /usr/lib/libthread.so.1 > > > >>> libjli.so => > > > >>> > > /export2/prog/SunOS_sparc/jdk1.7.0_07/bin/../jre/lib/sparc/jli/libjli.so > > > >>> libdl.so.1 => /usr/lib/libdl.so.1 > > > >>> libc.so.1 => /usr/lib/libc.so.1 > > > >>> libm.so.2 => /usr/lib/libm.so.2 > > > >>> /platform/SUNW,A70/lib/libc_psr.so.1 > > > >>> tyr java 114 ssh sunpc4 ldd /usr/local/jdk1.7.0_07/bin/javac > > > >>> libthread.so.1 => /usr/lib/libthread.so.1 > > > >>> libjli.so => > > > >>> /usr/local/jdk1.7.0_07/bin/../jre/lib/i386/jli/libjli.so > > > >>> libdl.so.1 => /usr/lib/libdl.so.1 > > > >>> libc.so.1 => /usr/lib/libc.so.1 > > > >>> libm.so.2 => /usr/lib/libm.so.2 > > > >>> tyr java 115 ssh linpc4 ldd /usr/local/jdk1.7.0_07/bin/javac > > > >>> linux-gate.so.1 => (0xffffe000) > > > >>> libpthread.so.0 => /lib/libpthread.so.0 (0xf77b2000) > > > >>> libjli.so => > > /usr/local/jdk1.7.0_07/bin/../jre/lib/i386/jli/libjli.so > > > >>> (0xf779d000) > > > >>> libdl.so.2 => /lib/libdl.so.2 (0xf7798000) > > > >>> libc.so.6 => /lib/libc.so.6 (0xf762b000) > > > >>> /lib/ld-linux.so.2 (0xf77ce000) > > > >>> > > > >>> > > > >>> I don't have any errors in the log files except the error for nfs. > > > >>> > > > >>> tyr openmpi-1.9-Linux.x86_64.32_cc 136 ls log.* > > > >>> log.configure.Linux.x86_64.32_cc > > log.make-install.Linux.x86_64.32_cc > > > >>> log.make-check.Linux.x86_64.32_cc log.make.Linux.x86_64.32_cc > > > >>> > > > >>> tyr openmpi-1.9-Linux.x86_64.32_cc 137 grep "Error 1" log.* > > > >>> log.make-check.Linux.x86_64.32_cc:make[3]: *** [check-TESTS] Error 1 > > > >>> log.make-check.Linux.x86_64.32_cc:make[1]: *** [check-recursive] > > Error 1 > > > >>> log.make-check.Linux.x86_64.32_cc:make: *** [check-recursive] Error 1 > > > >>> > > > >>> ... > > > >>> SUPPORT: OMPI Test failed: opal_path_nfs() (1 of 32 failed) > > > >>> FAIL: opal_path_nfs > > > >>> ======================================================== > > > >>> 1 of 2 tests failed > > > >>> Please report to http://www.open-mpi.org/community/help/ > > > >>> ======================================================== > > > >>> make[3]: *** [check-TESTS] Error 1 > > > >>> ... > > > >>> > > > >>> > > > >>> It doesn't help to build the class files on Linux (which should be > > > >>> independent of the architecture anyway). > > > >>> > > > >>> tyr java 131 ssh linpc4 > > > >>> linpc4 fd1026 98 cd .../prog/mpi/java > > > >>> linpc4 java 99 make clean > > > >>> rm -f /home/fd1026/mpi_classfiles/HelloMainWithBarrier.class \ > > > >>> /home/fd1026/mpi_classfiles/HelloMainWithoutBarrier.class > > > >>> linpc4 java 100 make > > > >>> mpijavac -d /home/fd1026/mpi_classfiles HelloMainWithBarrier.java > > > >>> mpijavac -d /home/fd1026/mpi_classfiles HelloMainWithoutBarrier.java > > > >>> > > > >>> linpc4 java 101 mpiexec -np 3 -host linpc4 \ > > > >>> java -cp $HOME/mpi_classfiles HelloMainWithBarrier > > > >>> > > -------------------------------------------------------------------------- > > > >>> It looks like opal_init failed for some reason; your parallel > > process is > > > >>> likely to abort. There are many reasons that a parallel process can > > > >>> fail during opal_init; some of which are due to configuration or > > > >>> environment problems. This failure appears to be an internal > > failure; > > > >>> here's some additional information (which may only be relevant to an > > > >>> Open MPI developer): > > > >>> > > > >>> mca_base_open failed > > > >>> --> Returned value -2 instead of OPAL_SUCCESS > > > >>> ... > > > >>> > > > >>> Has anybody else this problem as well? Do you know a solution? > > > >>> Thank you very much for any help in advance. > > > >>> > > > >>> > > > >>> Kind regards > > > >>> > > > >>> Siegmar > > > >>> > > > >>> _______________________________________________ > > > >>> users mailing list > > > >>> us...@open-mpi.org > > > >>> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > >> > > > >> > > > > /* A small MPI program, which delivers some information about its > > > > * machine, operating system, and some environment variables. > > > > * > > > > * > > > > * Compiling: > > > > * Store executable(s) into local directory. > > > > * mpicc -o <program name> <source code file name> > > > > * > > > > * Store executable(s) into predefined directories. > > > > * make > > > > * > > > > * Make program(s) automatically on all specified hosts. You must > > > > * edit the file "make_compile" and specify your host names before > > > > * you execute it. > > > > * make_compile > > > > * > > > > * Running: > > > > * LAM-MPI: > > > > * mpiexec -boot -np <number of processes> <program name> > > > > * or > > > > * mpiexec -boot \ > > > > * -host <hostname> -np <number of processes> <program name> : \ > > > > * -host <hostname> -np <number of processes> <program name> > > > > * or > > > > * mpiexec -boot [-v] -configfile <application file> > > > > * or > > > > * lamboot [-v] [<host file>] > > > > * mpiexec -np <number of processes> <program name> > > > > * or > > > > * mpiexec [-v] -configfile <application file> > > > > * lamhalt > > > > * > > > > * OpenMPI: > > > > * "host1", "host2", and so on can all have the same name, > > > > * if you want to start a virtual computer with some virtual > > > > * cpu's on the local host. The name "localhost" is allowed > > > > * as well. > > > > * > > > > * mpiexec -np <number of processes> <program name> > > > > * or > > > > * mpiexec --host <host1,host2,...> \ > > > > * -np <number of processes> <program name> > > > > * or > > > > * mpiexec -hostfile <hostfile name> \ > > > > * -np <number of processes> <program name> > > > > * or > > > > * mpiexec -app <application file> > > > > * > > > > * Cleaning: > > > > * local computer: > > > > * rm <program name> > > > > * or > > > > * make clean_all > > > > * on all specified computers (you must edit the file "make_clean_all" > > > > * and specify your host names before you execute it. > > > > * make_clean_all > > > > * > > > > * > > > > * File: environ_mpi.c Author: S. Gross > > > > * Date: 25.09.2012 > > > > * > > > > */ > > > > > > > > #include <stdio.h> > > > > #include <stdlib.h> > > > > #include <string.h> > > > > #include <unistd.h> > > > > #include <sys/utsname.h> > > > > #include "mpi.h" > > > > > > > > #define BUF_SIZE 8192 /* message buffer size > > */ > > > > #define MAX_TASKS 12 /* max. number of tasks > > */ > > > > #define SENDTAG 1 /* send message command > > */ > > > > #define EXITTAG 2 /* termination command > > */ > > > > #define MSGTAG 3 /* normal message token > > */ > > > > > > > > #define ENTASKS -1 /* error: too many tasks > > */ > > > > > > > > static void master (void); > > > > static void slave (void); > > > > > > > > int main (int argc, char *argv[]) > > > > { > > > > int mytid, /* my task id > > */ > > > > ntasks; /* number of parallel tasks > > */ > > > > > > > > MPI_Init (&argc, &argv); > > > > MPI_Comm_rank (MPI_COMM_WORLD, &mytid); > > > > MPI_Comm_size (MPI_COMM_WORLD, &ntasks); > > > > > > > > if (mytid == 0) > > > > { > > > > master (); > > > > } > > > > else > > > > { > > > > slave (); > > > > } > > > > MPI_Finalize (); > > > > return EXIT_SUCCESS; > > > > } > > > > > > > > > > > > /* Function for the "master task". The master sends a request to all > > > > * slaves asking for a message. After receiving and printing the > > > > * messages he sends all slaves a termination command. > > > > * > > > > * input parameters: not necessary > > > > * output parameters: not available > > > > * return value: nothing > > > > * side effects: no side effects > > > > * > > > > */ > > > > void master (void) > > > > { > > > > int ntasks, /* number of parallel tasks > > */ > > > > mytid, /* my task id */ > > > > num, /* number of entries */ > > > > i; /* loop variable */ > > > > char buf[BUF_SIZE + 1]; /* message buffer (+1 for > > '\0') > > */ > > > > MPI_Status stat; /* message details */ > > > > > > > > MPI_Comm_rank (MPI_COMM_WORLD, &mytid); > > > > MPI_Comm_size (MPI_COMM_WORLD, &ntasks); > > > > if (ntasks > MAX_TASKS) > > > > { > > > > fprintf (stderr, "Error: Too many tasks. Try again with at most " > > > > "%d tasks.\n", MAX_TASKS); > > > > /* terminate all slave tasks */ > > > > for (i = 1; i < ntasks; ++i) > > > > { > > > > MPI_Send ((char *) NULL, 0, MPI_CHAR, i, EXITTAG, MPI_COMM_WORLD); > > > > } > > > > MPI_Finalize (); > > > > exit (ENTASKS); > > > > } > > > > printf ("\n\nNow %d slave tasks are sending their environment.\n\n", > > > > ntasks - 1); > > > > /* request messages from slave tasks > > */ > > > > for (i = 1; i < ntasks; ++i) > > > > { > > > > MPI_Send ((char *) NULL, 0, MPI_CHAR, i, SENDTAG, MPI_COMM_WORLD); > > > > } > > > > /* wait for messages and print greetings > > */ > > > > for (i = 1; i < ntasks; ++i) > > > > { > > > > MPI_Recv (buf, BUF_SIZE, MPI_CHAR, MPI_ANY_SOURCE, > > > > MPI_ANY_TAG, MPI_COMM_WORLD, &stat); > > > > MPI_Get_count (&stat, MPI_CHAR, &num); > > > > buf[num] = '\0'; /* add missing end-of-string */ > > > > printf ("Environment from task %d:\n" > > > > " message type: %d\n" > > > > " msg length: %d characters\n" > > > > " message: %s\n\n", > > > > stat.MPI_SOURCE, stat.MPI_TAG, num, buf); > > > > } > > > > /* terminate all slave tasks > > */ > > > > for (i = 1; i < ntasks; ++i) > > > > { > > > > MPI_Send ((char *) NULL, 0, MPI_CHAR, i, EXITTAG, MPI_COMM_WORLD); > > > > } > > > > } > > > > > > > > > > > > /* Function for "slave tasks". The slave task sends its hostname, > > > > * operating system name and release, and processor architecture > > > > * as a message to the master. > > > > * > > > > * input parameters: not necessary > > > > * output parameters: not available > > > > * return value: nothing > > > > * side effects: no side effects > > > > * > > > > */ > > > > void slave (void) > > > > { > > > > struct utsname sys_info; /* system information */ > > > > int mytid, /* my task id > > */ > > > > num_env_vars, /* # of environment variables */ > > > > i, /* loop variable */ > > > > more_to_do; > > > > char buf[BUF_SIZE], /* message buffer > > */ > > > > *env_vars[] = {"PATH", > > > > "LD_LIBRARY_PATH_32", > > > > "LD_LIBRARY_PATH_64", > > > > "LD_LIBRARY_PATH", > > > > "CLASSPATH"}; > > > > MPI_Status stat; /* message details */ > > > > > > > > MPI_Comm_rank (MPI_COMM_WORLD, &mytid); > > > > num_env_vars = sizeof (env_vars) / sizeof (env_vars[0]); > > > > more_to_do = 1; > > > > while (more_to_do == 1) > > > > { > > > > /* wait for a message from the master task > > */ > > > > MPI_Recv (buf, BUF_SIZE, MPI_CHAR, 0, MPI_ANY_TAG, > > > > MPI_COMM_WORLD, &stat); > > > > if (stat.MPI_TAG != EXITTAG) > > > > { > > > > uname (&sys_info); > > > > strcpy (buf, "\n hostname: "); > > > > strncpy (buf + strlen (buf), sys_info.nodename, > > > > BUF_SIZE - strlen (buf)); > > > > strncpy (buf + strlen (buf), "\n operating system: ", > > > > BUF_SIZE - strlen (buf)); > > > > strncpy (buf + strlen (buf), sys_info.sysname, > > > > BUF_SIZE - strlen (buf)); > > > > strncpy (buf + strlen (buf), "\n release: ", > > > > BUF_SIZE - strlen (buf)); > > > > strncpy (buf + strlen (buf), sys_info.release, > > > > BUF_SIZE - strlen (buf)); > > > > strncpy (buf + strlen (buf), "\n processor: ", > > > > BUF_SIZE - strlen (buf)); > > > > strncpy (buf + strlen (buf), sys_info.machine, > > > > BUF_SIZE - strlen (buf)); > > > > for (i = 0; i < num_env_vars; ++i) > > > > { > > > > char *env_val, /* pointer to environment value */ > > > > *delimiter = ":" , /* field delimiter for "strtok" */ > > > > *next_tok; /* next token */ > > > > > > > > env_val = getenv (env_vars[i]); > > > > if (env_val != NULL) > > > > { > > > > if ((strlen (buf) + strlen (env_vars[i]) + 6) < BUF_SIZE) > > > > { > > > > strncpy (buf + strlen (buf), "\n ", > > > > BUF_SIZE - strlen (buf)); > > > > strncpy (buf + strlen (buf), env_vars[i], > > > > BUF_SIZE - strlen (buf)); > > > > } > > > > else > > > > { > > > > fprintf (stderr, "Buffer too small. Couldn't add \"%s\"." > > > > "\n\n", env_vars[i]); > > > > } > > > > /* Get first token in "env_val". "strtok" skips all > > > > * characters that are contained in the current delimiter > > > > * string. If it finds a character which is not contained > > > > * in the delimiter string, it is the start of the first > > > > * token. Now it searches for the next character which is > > > > * part of the delimiter string. If it finds one it will > > > > * overwrite it by a '\0' to terminate the first token. > > > > * Otherwise the token extends to the end of the string. > > > > * Subsequent calls of "strtok" use a NULL pointer as first > > > > * argument and start searching from the saved position > > > > * after the last token. "strtok" returns NULL if it > > > > * couldn't find a token. > > > > */ > > > > next_tok = strtok (env_val, delimiter); > > > > while (next_tok != NULL) > > > > { > > > > if ((strlen (buf) + strlen (next_tok) + 25) < BUF_SIZE) > > > > { > > > > strncpy (buf + strlen (buf), "\n ", > > > > BUF_SIZE - strlen (buf)); > > > > strncpy (buf + strlen (buf), next_tok, > > > > BUF_SIZE - strlen (buf)); > > > > } > > > > else > > > > { > > > > fprintf (stderr, "Buffer too small. Couldn't add \"%s\" " > > > > "to %s.\n\n", next_tok, env_vars[i]); > > > > } > > > > /* get next token */ > > > > next_tok = strtok (NULL, delimiter); > > > > } > > > > } > > > > } > > > > MPI_Send (buf, strlen (buf), MPI_CHAR, stat.MPI_SOURCE, > > > > MSGTAG, MPI_COMM_WORLD); > > > > } > > > > else > > > > { > > > > more_to_do = 0; /* terminate > > */ > > > > } > > > > } > > > > } > > > > /* A small program, which delivers some information about its > > > > * machine, operating system, and some environment variables. > > > > * > > > > * > > > > * Compiling: > > > > * Store executable(s) into local directory. > > > > * (g)cc -o environ_without_mpi environ_without_mpi.c > > > > * > > > > * Running: > > > > * environ_without_mpi > > > > * > > > > * > > > > * File: environ_without_mpi.c Author: S. Gross > > > > * Date: 25.09.2012 > > > > * > > > > */ > > > > > > > > #include <stdio.h> > > > > #include <stdlib.h> > > > > #include <string.h> > > > > #include <unistd.h> > > > > #include <sys/utsname.h> > > > > > > > > #define BUF_SIZE 8192 /* message buffer size > > */ > > > > > > > > int main (int argc, char *argv[]) > > > > { > > > > struct utsname sys_info; /* system information */ > > > > int num_env_vars, /* # of environment > > variables > > */ > > > > i; /* loop variable */ > > > > char buf[BUF_SIZE], /* message buffer > > */ > > > > *env_vars[] = {"PATH", > > > > "LD_LIBRARY_PATH_32", > > > > "LD_LIBRARY_PATH_64", > > > > "LD_LIBRARY_PATH", > > > > "CLASSPATH"}; > > > > > > > > num_env_vars = sizeof (env_vars) / sizeof (env_vars[0]); > > > > uname (&sys_info); > > > > strcpy (buf, "\n hostname: "); > > > > strncpy (buf + strlen (buf), sys_info.nodename, > > > > BUF_SIZE - strlen (buf)); > > > > strncpy (buf + strlen (buf), "\n operating system: ", > > > > BUF_SIZE - strlen (buf)); > > > > strncpy (buf + strlen (buf), sys_info.sysname, > > > > BUF_SIZE - strlen (buf)); > > > > strncpy (buf + strlen (buf), "\n release: ", > > > > BUF_SIZE - strlen (buf)); > > > > strncpy (buf + strlen (buf), sys_info.release, > > > > BUF_SIZE - strlen (buf)); > > > > strncpy (buf + strlen (buf), "\n processor: ", > > > > BUF_SIZE - strlen (buf)); > > > > strncpy (buf + strlen (buf), sys_info.machine, > > > > BUF_SIZE - strlen (buf)); > > > > for (i = 0; i < num_env_vars; ++i) > > > > { > > > > char *env_val, /* pointer to environment value */ > > > > *delimiter = ":" , /* field delimiter for "strtok" */ > > > > *next_tok; /* next token */ > > > > > > > > env_val = getenv (env_vars[i]); > > > > if (env_val != NULL) > > > > { > > > > if ((strlen (buf) + strlen (env_vars[i]) + 6) < BUF_SIZE) > > > > { > > > > strncpy (buf + strlen (buf), "\n ", > > > > BUF_SIZE - strlen (buf)); > > > > strncpy (buf + strlen (buf), env_vars[i], > > > > BUF_SIZE - strlen (buf)); > > > > } > > > > else > > > > { > > > > fprintf (stderr, "Buffer too small. Couldn't add \"%s\"." > > > > "\n\n", env_vars[i]); > > > > } > > > > /* Get first token in "env_val". "strtok" skips all > > > > * characters that are contained in the current delimiter > > > > * string. If it finds a character which is not contained > > > > * in the delimiter string, it is the start of the first > > > > * token. Now it searches for the next character which is > > > > * part of the delimiter string. If it finds one it will > > > > * overwrite it by a '\0' to terminate the first token. > > > > * Otherwise the token extends to the end of the string. > > > > * Subsequent calls of "strtok" use a NULL pointer as first > > > > * argument and start searching from the saved position > > > > * after the last token. "strtok" returns NULL if it > > > > * couldn't find a token. > > > > */ > > > > next_tok = strtok (env_val, delimiter); > > > > while (next_tok != NULL) > > > > { > > > > if ((strlen (buf) + strlen (next_tok) + 25) < BUF_SIZE) > > > > { > > > > strncpy (buf + strlen (buf), "\n ", > > > > BUF_SIZE - strlen (buf)); > > > > strncpy (buf + strlen (buf), next_tok, > > > > BUF_SIZE - strlen (buf)); > > > > } > > > > else > > > > { > > > > fprintf (stderr, "Buffer too small. Couldn't add \"%s\" " > > > > "to %s.\n\n", next_tok, env_vars[i]); > > > > } > > > > /* get next token */ > > > > next_tok = strtok (NULL, delimiter); > > > > } > > > > } > > > > } > > > > printf ("Environment:\n" > > > > " message: %s\n\n", buf); > > > > return EXIT_SUCCESS; > > > > } > > > > > > > > > >