Hi,

> Does the behavior only occur with Java applications, as your subject
> implies? I thought this was a more general behavior based on prior notes?

It is a general problem as you can see in the older email below. I
didn't change the header because I detected this behaviour when I
tried out mpiJava.


> As I said back then, I have no earthly idea why your local machine is being
> ignored, and I cannot replicate that behavior on any system available to me.
> 
> What you might try is adding --display-allocation --display-devel-map to
> your cmd line and see what the system thinks it is doing. The first option
> will display what nodes and slots it thinks are available to it, and the
> second will report where it thinks it placed everything.

tyr topo 244 mpiexec -np 3 -host tyr,sunpc4,linpc4 --display-allocation \
  --display-devel-map hostname

======================   ALLOCATED NODES   ======================

 Data for node: tyr             Launch id: -1   State: 2
        Daemon: [[3909,0],0]    Daemon launched: True
        Num slots: 1    Slots in use: 0 Oversubscribed: FALSE
        Num slots allocated: 1  Max slots: 0
        Username on node: NULL
        Num procs: 0    Next node_rank: 0
 Data for node: sunpc4          Launch id: -1   State: 2
        Daemon: [[3909,0],1]    Daemon launched: False
        Num slots: 1    Slots in use: 0 Oversubscribed: FALSE
        Num slots allocated: 1  Max slots: 0
        Username on node: NULL
        Num procs: 0    Next node_rank: 0
 Data for node: linpc4          Launch id: -1   State: 2
        Daemon: [[3909,0],2]    Daemon launched: False
        Num slots: 1    Slots in use: 0 Oversubscribed: FALSE
        Num slots allocated: 1  Max slots: 0
        Username on node: NULL
        Num procs: 0    Next node_rank: 0

=================================================================

 Mapper requested: NULL  Last mapper: round_robin  Mapping policy: BYSLOT
   Ranking policy: SLOT  Binding policy: NONE[NODE]  Cpu set: NULL  PPR: NULL
        Num new daemons: 0      New daemon starting vpid INVALID
        Num nodes: 2

 Data for node: sunpc4          Launch id: -1   State: 2
        Daemon: [[3909,0],1]    Daemon launched: False
        Num slots: 1    Slots in use: 1 Oversubscribed: TRUE
        Num slots allocated: 1  Max slots: 0
        Username on node: NULL
        Num procs: 2    Next node_rank: 2
        Data for proc: [[3909,1],0]
                Pid: 0  Local rank: 0   Node rank: 0    App rank: 0
                State: INITIALIZED      Restarts: 0     App_context: 0
                  Locale: 0-1     Binding: NULL[0]
        Data for proc: [[3909,1],1]
                Pid: 0  Local rank: 1   Node rank: 1    App rank: 1
                State: INITIALIZED      Restarts: 0     App_context: 0
                  Locale: 0-1     Binding: NULL[0]

 Data for node: linpc4          Launch id: -1   State: 2
        Daemon: [[3909,0],2]    Daemon launched: False
        Num slots: 1    Slots in use: 1 Oversubscribed: FALSE
        Num slots allocated: 1  Max slots: 0
        Username on node: NULL
        Num procs: 1    Next node_rank: 1
        Data for proc: [[3909,1],2]
                Pid: 0  Local rank: 0   Node rank: 0    App rank: 2
                State: INITIALIZED      Restarts: 0     App_context: 0
                  Locale: 0-1     Binding: NULL[0]
linpc4
sunpc4.informatik.hs-fulda.de
sunpc4.informatik.hs-fulda.de


I get the following output for the command for openmpi-1.6.2.

tyr topo 109 mpiexec -np 3 -host tyr,sunpc4,linpc4 \
  --display-allocation --display-devel-map hostname

======================   ALLOCATED NODES   ======================

 Data for node: tyr.informatik.hs-fulda.de              Launch id: -1   State: 2
        Num boards: 1   Num sockets/board: 0    Num cores/socket: 0
        Daemon: [[4018,0],0]    Daemon launched: True
        Num slots: 1    Slots in use: 0 Oversubscribed: FALSE
        Num slots allocated: 1  Max slots: 0
        Username on node: NULL
        Detected Resources:
        Num procs: 0    Next node_rank: 0
 Data for node: sunpc4          Launch id: -1   State: 2
        Num boards: 1   Num sockets/board: 0    Num cores/socket: 0
        Daemon: Not defined     Daemon launched: False
        Num slots: 1    Slots in use: 0 Oversubscribed: FALSE
        Num slots allocated: 1  Max slots: 0
        Username on node: NULL
        Detected Resources:
        Num procs: 0    Next node_rank: 0
 Data for node: linpc4          Launch id: -1   State: 2
        Num boards: 1   Num sockets/board: 0    Num cores/socket: 0
        Daemon: Not defined     Daemon launched: False
        Num slots: 1    Slots in use: 0 Oversubscribed: FALSE
        Num slots allocated: 1  Max slots: 0
        Username on node: NULL
        Detected Resources:
        Num procs: 0    Next node_rank: 0

=================================================================

 Map generated by mapping policy: 0400
        Npernode: 0     Oversubscribe allowed: TRUE     CPU Lists: FALSE
        Num new daemons: 2      New daemon starting vpid 1
        Num nodes: 3

 Data for node: tyr.informatik.hs-fulda.de              Launch id: -1   State: 2
        Num boards: 1   Num sockets/board: 0    Num cores/socket: 0
        Daemon: [[4018,0],0]    Daemon launched: True
        Num slots: 1    Slots in use: 1 Oversubscribed: FALSE
        Num slots allocated: 1  Max slots: 0
        Username on node: NULL
        Detected Resources:
        Num procs: 1    Next node_rank: 1
        Data for proc: [[4018,1],0]
                Pid: 0  Local rank: 0   Node rank: 0
                State: 0        Restarts: 0     App_context: 0  Slot list: NULL

 Data for node: sunpc4          Launch id: -1   State: 2
        Num boards: 1   Num sockets/board: 0    Num cores/socket: 0
        Daemon: [[4018,0],1]    Daemon launched: False
        Num slots: 1    Slots in use: 1 Oversubscribed: FALSE
        Num slots allocated: 1  Max slots: 0
        Username on node: NULL
        Detected Resources:
        Num procs: 1    Next node_rank: 1
        Data for proc: [[4018,1],1]
                Pid: 0  Local rank: 0   Node rank: 0
                State: 0        Restarts: 0     App_context: 0  Slot list: NULL

 Data for node: linpc4          Launch id: -1   State: 2
        Num boards: 1   Num sockets/board: 0    Num cores/socket: 0
        Daemon: [[4018,0],2]    Daemon launched: False
        Num slots: 1    Slots in use: 1 Oversubscribed: FALSE
        Num slots allocated: 1  Max slots: 0
        Username on node: NULL
        Detected Resources:
        Num procs: 1    Next node_rank: 1
        Data for proc: [[4018,1],2]
                Pid: 0  Local rank: 0   Node rank: 0
                State: 0        Restarts: 0     App_context: 0  Slot list: NULL
linpc4
sunpc4.informatik.hs-fulda.de
tyr.informatik.hs-fulda.de


Is the above output helpful? Thank you very much for any help in advance.


Kind regards

Siegmar



> On Wed, Sep 26, 2012 at 4:53 AM, Siegmar Gross <
> siegmar.gr...@informatik.hs-fulda.de> wrote:
> 
> > Hi,
> >
> > yesterday I have installed openmpi-1.9a1r27362 and I still have a
> > problem with "-host". My local machine will not be used, if I try
> > to start processes on three hosts.
> >
> > tyr:    Solaris 10, Sparc
> > sunpc4: Solaris 10 , x86_64
> > linpc4: openSUSE-Linux 12.1, x86_64
> >
> >
> > tyr mpi_classfiles 175 javac HelloMainWithoutMPI.java
> > tyr mpi_classfiles 176 mpiexec -np 3 -host tyr,sunpc4,linpc4 \
> >   java -cp $HOME/mpi_classfiles HelloMainWithoutMPI
> > Hello from linpc4.informatik.hs-fulda.de/193.174.26.225
> > Hello from sunpc4.informatik.hs-fulda.de/193.174.26.224
> > Hello from sunpc4.informatik.hs-fulda.de/193.174.26.224
> > tyr mpi_classfiles 177 which mpiexec
> > /usr/local/openmpi-1.9_64_cc/bin/mpiexec
> >
> >
> > Everything works fine with openmpi-1.6.2rc5r27346.
> >
> > tyr mpi_classfiles 108 javac HelloMainWithoutMPI.java
> > tyr mpi_classfiles 109 mpiexec -np 3 -host tyr,sunpc4,linpc4 \
> >   java -cp $HOME/mpi_classfiles HelloMainWithoutMPI
> > Hello from linpc4.informatik.hs-fulda.de/193.174.26.225
> > Hello from sunpc4.informatik.hs-fulda.de/193.174.26.224
> > Hello from tyr.informatik.hs-fulda.de/193.174.24.39
> > tyr mpi_classfiles 110 which mpiexec
> > /usr/local/openmpi-1.6.2_64_cc/bin/mpiexec
> >
> >
> > In my opinion it is a problem with openmpi-1.9. I used the following
> > configure command for Sparc. The commands for the other platforms are
> > similar.
> >
> > ../openmpi-1.9a1r27362/configure --prefix=/usr/local/openmpi-1.9_64_cc \
> >   --libdir=/usr/local/openmpi-1.9_64_cc/lib64 \
> >   --with-jdk-bindir=/usr/local/jdk1.7.0_07/bin/sparcv9 \
> >   --with-jdk-headers=/usr/local/jdk1.7.0_07/include \
> >   JAVA_HOME=/usr/local/jdk1.7.0_07 \
> >   LDFLAGS="-m64" \
> >   CC="cc" CXX="CC" FC="f95" \
> >   CFLAGS="-m64" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" \
> >   CPP="cpp" CXXCPP="cpp" \
> >   CPPFLAGS="" CXXCPPFLAGS="" \
> >   C_INCL_PATH="" C_INCLUDE_PATH="" CPLUS_INCLUDE_PATH="" \
> >   OBJC_INCLUDE_PATH="" OPENMPI_HOME="" \
> >   --enable-cxx-exceptions \
> >   --enable-mpi-java \
> >   --enable-heterogeneous \
> >   --enable-opal-multi-threads \
> >   --enable-mpi-thread-multiple \
> >   --with-threads=posix \
> >   --with-hwloc=internal \
> >   --without-verbs \
> >   --without-udapl \
> >   --with-wrapper-cflags=-m64 \
> >   --enable-debug \
> >   |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc
> >
> > Can I provide anything to track the problem? Thank you very much for
> > any help in advance.
> >
> >
> > Kind regards
> >
> > Siegmar
> >
> >
> >
> > > >>> I tried mpiJava on a 32-bit installation of openmpi-1.9a1r27361.
> > > >>> Why doesn't "mpiexec" start a process on my local machine (it
> > > >>> is not a matter of Java, because I have the same behaviour when
> > > >>> I use "hostname")?
> > > >>>
> > > >>> tyr java 133 mpiexec -np 3 -host tyr,sunpc4,sunpc1 \
> > > >>> java -cp $HOME/mpi_classfiles HelloMainWithBarrier
> > > >>> Process 0 of 3 running on sunpc4.informatik.hs-fulda.de
> > > >>> Process 1 of 3 running on sunpc4.informatik.hs-fulda.de
> > > >>> Process 2 of 3 running on sunpc1
> > > >>> ...
> > > >>>
> > > >>> tyr small_prog 142 mpiexec -np 3 -host tyr,sunpc4,sunpc1 hostname
> > > >>> sunpc1
> > > >>> sunpc4.informatik.hs-fulda.de
> > > >>> sunpc4.informatik.hs-fulda.de
> > > >>>
> > > >>
> > > >> No idea - it works fine for me. Do you have an environmental
> > > >> variable, or something in your default MCA param file, that
> > > >> indicates "no_use_local"?
> > > >
> > > > I have only built and installed Open MPI and I have no param file.
> > > > I don't have a mca environment variable.
> > > >
> > > > tyr hello_1 136 grep local \
> > > >  /usr/local/openmpi-1.9_64_cc/etc/openmpi-mca-params.conf
> > > > # $sysconf is a directory on a local disk, it is likely that changes
> > > > #   component_path = /usr/local/lib/openmpi:~/my_openmpi_components
> > > >
> > > > tyr hello_1 143 env | grep -i mca
> > > > tyr hello_1 144
> > >
> > > No ideas - I can't make it behave that way  :-(
> > >
> > > >
> > > >
> > > >>> The command breaks if I add a Linux machine.
> > > >>
> > > >> Check to ensure that the path and ld_library_path on your linux box
> > > >> is being correctly set to point to the corresponding Linux OMPI libs.
> > > >> It looks like that isn't the case. Remember, the Java bindings are
> > > >> just that - they are bindings that wrap on top of the regular C
> > > >> code. Thus, the underlying OMPI system remains system-dependent,
> > > >> and you must have the appropriate native libraries installed on
> > > >> each machine.
> > > >
> > > > I implemented a small program, which shows these values and they
> > > > are wrong for MPI, but I have no idea why. The two entries at the
> > > > beginning from PATH and LD_LIBRARY_PATH are not from our normal
> > > > environment, because I add these values at the end of the environment
> > > > variables PATH, LD_LIBRARY_PATH_32, and LD_LIBRARY_PATH_64. Afterwards
> > > > I set LD_LIBRARY_PATH to LD_LIBRARY_PATH_64 on a 64-bit Solaris
> > > > machine, to LD_LIBRARY_PATH_32 followed by LD_LIBRARY_PATH_64 on a
> > > > 64-bit Linux machine, and to LD_LIBRARY_PATH_32 on every 32-bit
> > > > machine.
> > > >
> > >
> > > I see the problem - our heterogeneous support could use some
> > improvement, but
> > it'll be awhile before I can get to it.
> > >
> > > What's happening is that we are picking up and propagating the prefix you
> > specified, prepending it to your path and ld_library_path. Did you by
> > chance
> > configure with --enable-orterun-prefix-by-default? Or specify --prefix on
> > your
> > cmd line? Otherwise, it shouldn't be doing this. For this purpose, you
> > cannot
> > use either of those options.
> > >
> > > Also, you'll need to add --enable-heterogeneous to your configure so the
> > MPI
> > layer builds the right support, and add --hetero-nodes to your cmd line.
> > >
> > >
> > > >
> > > > Now 1 slave tasks are sending their environment.
> > > >
> > > > Environment from task 1:
> > > >  message type:        3
> > > >  msg length:          4622 characters
> > > >  message:
> > > >    hostname:          tyr.informatik.hs-fulda.de
> > > >    operating system:  SunOS
> > > >    release:           5.10
> > > >    processor:         sun4u
> > > >    PATH
> > > >                       /usr/local/openmpi-1.9_64_cc/bin  (!!!)
> > > >                       /usr/local/openmpi-1.9_64_cc/bin  (!!!)
> > > >                       /usr/local/eclipse-3.6.1
> > > >                       ...
> > > >                       /usr/local/openmpi-1.9_64_cc/bin  (<- from our
> > environment)
> > > >    LD_LIBRARY_PATH_32
> > > >                       /usr/lib
> > > >                       /usr/local/jdk1.7.0_07/jre/lib/sparc
> > > >                       ...
> > > >                       /usr/local/openmpi-1.9_64_cc/lib  (<- from our
> > environment)
> > > >    LD_LIBRARY_PATH_64
> > > >                       /usr/lib/sparcv9
> > > >                       /usr/local/jdk1.7.0_07/jre/lib/sparcv9
> > > >                       ...
> > > >                       /usr/local/openmpi-1.9_64_cc/lib64  (<- from our
> > environment)
> > > >    LD_LIBRARY_PATH
> > > >                       /usr/local/openmpi-1.9_64_cc/lib     (!!!)
> > > >                       /usr/local/openmpi-1.9_64_cc/lib64   (!!!)
> > > >                       /usr/lib/sparcv9
> > > >                       /usr/local/jdk1.7.0_07/jre/lib/sparcv9
> > > >                       ...
> > > >                       /usr/local/openmpi-1.9_64_cc/lib64  (<- from our
> > environment)
> > > >    CLASSPATH
> > > >                       /usr/local/junit4.10
> > > >                       /usr/local/junit4.10/junit-4.10.jar
> > > >                       //usr/local/jdk1.7.0_07/j3d/lib/ext/j3dcore.jar
> > > >                       //usr/local/jdk1.7.0_07/j3d/lib/ext/j3dutils.jar
> > > >                       //usr/local/jdk1.7.0_07/j3d/lib/ext/vecmath.jar
> > > >                       /usr/local/javacc-5.0/javacc.jar
> > > >                       .
> > > >
> > > >
> > > > Without MPI the program uses our environment.
> > > >
> > > > tyr hello_1 147 diff env_with*
> > > > 1,7c1
> > > > <
> > > > <
> > > > < Now 1 slave tasks are sending their environment.
> > > > <
> > > > < Environment from task 1:
> > > > <   message type:        3
> > > > <   msg length:          4622 characters
> > > > ---
> > > >> Environment:
> > > > 14,15d7
> > > > <                        /usr/local/openmpi-1.9_64_cc/bin
> > > > <                        /usr/local/openmpi-1.9_64_cc/bin
> > > > 81,82d72
> > > > <                        /usr/local/openmpi-1.9_64_cc/lib
> > > > <                        /usr/local/openmpi-1.9_64_cc/lib64
> > > > tyr hello_1 148
> > > >
> > > >
> > > > I have attached the programs so that you can check yourself and
> > > > hopefully get the same results. Do you modify PATH and LD_LIBRARY_PATH?
> > > >
> > > >
> > > > Kind regards
> > > >
> > > > Siegmar
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >>> tyr java 110 mpiexec -np 3 -host tyr,sunpc4,linpc4 \
> > > >>> java -cp $HOME/mpi_classfiles HelloMainWithBarrier
> > > >>>
> > --------------------------------------------------------------------------
> > > >>> It looks like opal_init failed for some reason; your parallel
> > process is
> > > >>> likely to abort.  There are many reasons that a parallel process can
> > > >>> fail during opal_init; some of which are due to configuration or
> > > >>> environment problems.  This failure appears to be an internal
> > failure;
> > > >>> here's some additional information (which may only be relevant to an
> > > >>> Open MPI developer):
> > > >>>
> > > >>> mca_base_open failed
> > > >>> --> Returned value -2 instead of OPAL_SUCCESS
> > > >>>
> > --------------------------------------------------------------------------
> > > >>>
> > --------------------------------------------------------------------------
> > > >>> It looks like orte_init failed for some reason; your parallel
> > process is
> > > >>> likely to abort.  There are many reasons that a parallel process can
> > > >>> fail during orte_init; some of which are due to configuration or
> > > >>> environment problems.  This failure appears to be an internal
> > failure;
> > > >>> here's some additional information (which may only be relevant to an
> > > >>> Open MPI developer):
> > > >>>
> > > >>> opal_init failed
> > > >>> --> Returned value Out of resource (-2) instead of ORTE_SUCCESS
> > > >>>
> > --------------------------------------------------------------------------
> > > >>>
> > --------------------------------------------------------------------------
> > > >>> It looks like MPI_INIT failed for some reason; your parallel process
> > is
> > > >>> likely to abort.  There are many reasons that a parallel process can
> > > >>> fail during MPI_INIT; some of which are due to configuration or
> > environment
> > > >>> problems.  This failure appears to be an internal failure; here's
> > some
> > > >>> additional information (which may only be relevant to an Open MPI
> > > >>> developer):
> > > >>>
> > > >>> ompi_mpi_init: orte_init failed
> > > >>> --> Returned "Out of resource" (-2) instead of "Success" (0)
> > > >>>
> > --------------------------------------------------------------------------
> > > >>> *** An error occurred in MPI_Init
> > > >>> *** on a NULL communicator
> > > >>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now
> > abort,
> > > >>> ***    and potentially your MPI job)
> > > >>> [linpc4:27369] Local abort before MPI_INIT completed successfully;
> > > >>> not able to aggregate error messages, and not able to guarantee
> > > >>> that all other processes were killed!
> > > >>> -------------------------------------------------------
> > > >>> Primary job  terminated normally, but 1 process returned
> > > >>> a non-zero exit code.. Per user-direction, the job has been aborted.
> > > >>> -------------------------------------------------------
> > > >>>
> > --------------------------------------------------------------------------
> > > >>> mpiexec detected that one or more processes exited with non-zero
> > status,
> > > >>> thus causing
> > > >>> the job to be terminated. The first process to do so was:
> > > >>>
> > > >>> Process name: [[21095,1],2]
> > > >>> Exit code:    1
> > > >>>
> > --------------------------------------------------------------------------
> > > >>>
> > > >>>
> > > >>> tyr java 111 which mpijavac
> > > >>> /usr/local/openmpi-1.9_32_cc/bin/mpijavac
> > > >>> tyr java 112 more /usr/local/openmpi-1.9_32_cc/bin/mpijavac
> > > >>> #!/usr/bin/env perl
> > > >>>
> > > >>> # WARNING: DO NOT EDIT THE mpijava.pl FILE AS IT IS GENERATED!
> > > >>> #          MAKE ALL CHANGES IN mpijava.pl.in
> > > >>>
> > > >>> # Copyright (c) 2011      Cisco Systems, Inc.  All rights reserved.
> > > >>> # Copyright (c) 2012      Oracle and/or its affiliates.  All rights
> > reserved.
> > > >>>
> > > >>> use strict;
> > > >>>
> > > >>> # The main purpose of this wrapper compiler is to check for
> > > >>> # and adjust the Java class path to include the OMPI classes
> > > >>> # in mpi.jar. The user may have specified a class path on
> > > >>> # our cmd line, or it may be in the environment, so we have
> > > >>> # to check for both. We also need to be careful not to
> > > >>> # just override the class path as it probably includes classes
> > > >>> # they need for their application! It also may already include
> > > >>> # the path to mpi.jar, and while it doesn't hurt anything, we
> > > >>> # don't want to include our class path more than once to avoid
> > > >>> # user astonishment
> > > >>>
> > > >>> # Let the build system provide us with some critical values
> > > >>> my $my_compiler = "/usr/local/jdk1.7.0_07/bin/javac";
> > > >>> my $ompi_classpath = "/usr/local/openmpi-1.9_32_cc/lib/mpi.jar";
> > > >>>
> > > >>> # globals
> > > >>> my $showme_arg = 0;
> > > >>> my $verbose = 0;
> > > >>> my $my_arg;
> > > >>> ...
> > > >>>
> > > >>>
> > > >>> All libraries are available.
> > > >>>
> > > >>> tyr java 113 ldd /usr/local/jdk1.7.0_07/bin/javac
> > > >>>       libthread.so.1 =>        /usr/lib/libthread.so.1
> > > >>>       libjli.so =>
> > > >>>
> > /export2/prog/SunOS_sparc/jdk1.7.0_07/bin/../jre/lib/sparc/jli/libjli.so
> > > >>>       libdl.so.1 =>    /usr/lib/libdl.so.1
> > > >>>       libc.so.1 =>     /usr/lib/libc.so.1
> > > >>>       libm.so.2 =>     /usr/lib/libm.so.2
> > > >>>       /platform/SUNW,A70/lib/libc_psr.so.1
> > > >>> tyr java 114 ssh sunpc4 ldd /usr/local/jdk1.7.0_07/bin/javac
> > > >>>       libthread.so.1 =>        /usr/lib/libthread.so.1
> > > >>>       libjli.so =>
> > > >>> /usr/local/jdk1.7.0_07/bin/../jre/lib/i386/jli/libjli.so
> > > >>>       libdl.so.1 =>    /usr/lib/libdl.so.1
> > > >>>       libc.so.1 =>     /usr/lib/libc.so.1
> > > >>>       libm.so.2 =>     /usr/lib/libm.so.2
> > > >>> tyr java 115 ssh linpc4 ldd /usr/local/jdk1.7.0_07/bin/javac
> > > >>>       linux-gate.so.1 =>  (0xffffe000)
> > > >>>       libpthread.so.0 => /lib/libpthread.so.0 (0xf77b2000)
> > > >>>       libjli.so =>
> > /usr/local/jdk1.7.0_07/bin/../jre/lib/i386/jli/libjli.so
> > > >>> (0xf779d000)
> > > >>>       libdl.so.2 => /lib/libdl.so.2 (0xf7798000)
> > > >>>       libc.so.6 => /lib/libc.so.6 (0xf762b000)
> > > >>>       /lib/ld-linux.so.2 (0xf77ce000)
> > > >>>
> > > >>>
> > > >>> I don't have any errors in the log files except the error for nfs.
> > > >>>
> > > >>> tyr openmpi-1.9-Linux.x86_64.32_cc 136 ls log.*
> > > >>> log.configure.Linux.x86_64.32_cc
> > log.make-install.Linux.x86_64.32_cc
> > > >>> log.make-check.Linux.x86_64.32_cc  log.make.Linux.x86_64.32_cc
> > > >>>
> > > >>> tyr openmpi-1.9-Linux.x86_64.32_cc 137 grep "Error 1" log.*
> > > >>> log.make-check.Linux.x86_64.32_cc:make[3]: *** [check-TESTS] Error 1
> > > >>> log.make-check.Linux.x86_64.32_cc:make[1]: *** [check-recursive]
> > Error 1
> > > >>> log.make-check.Linux.x86_64.32_cc:make: *** [check-recursive] Error 1
> > > >>>
> > > >>> ...
> > > >>> SUPPORT: OMPI Test failed: opal_path_nfs() (1 of 32 failed)
> > > >>> FAIL: opal_path_nfs
> > > >>> ========================================================
> > > >>> 1 of 2 tests failed
> > > >>> Please report to http://www.open-mpi.org/community/help/
> > > >>> ========================================================
> > > >>> make[3]: *** [check-TESTS] Error 1
> > > >>> ...
> > > >>>
> > > >>>
> > > >>> It doesn't help to build the class files on Linux (which should be
> > > >>> independent of the architecture anyway).
> > > >>>
> > > >>> tyr java 131 ssh linpc4
> > > >>> linpc4 fd1026 98 cd .../prog/mpi/java
> > > >>> linpc4 java 99 make clean
> > > >>> rm -f /home/fd1026/mpi_classfiles/HelloMainWithBarrier.class \
> > > >>> /home/fd1026/mpi_classfiles/HelloMainWithoutBarrier.class
> > > >>> linpc4 java 100 make
> > > >>> mpijavac -d /home/fd1026/mpi_classfiles HelloMainWithBarrier.java
> > > >>> mpijavac -d /home/fd1026/mpi_classfiles HelloMainWithoutBarrier.java
> > > >>>
> > > >>> linpc4 java 101  mpiexec -np 3 -host linpc4 \
> > > >>> java -cp $HOME/mpi_classfiles HelloMainWithBarrier
> > > >>>
> > --------------------------------------------------------------------------
> > > >>> It looks like opal_init failed for some reason; your parallel
> > process is
> > > >>> likely to abort.  There are many reasons that a parallel process can
> > > >>> fail during opal_init; some of which are due to configuration or
> > > >>> environment problems.  This failure appears to be an internal
> > failure;
> > > >>> here's some additional information (which may only be relevant to an
> > > >>> Open MPI developer):
> > > >>>
> > > >>> mca_base_open failed
> > > >>> --> Returned value -2 instead of OPAL_SUCCESS
> > > >>> ...
> > > >>>
> > > >>> Has anybody else this problem as well? Do you know a solution?
> > > >>> Thank you very much for any help in advance.
> > > >>>
> > > >>>
> > > >>> Kind regards
> > > >>>
> > > >>> Siegmar
> > > >>>
> > > >>> _______________________________________________
> > > >>> users mailing list
> > > >>> us...@open-mpi.org
> > > >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > >>
> > > >>
> > > > /* A small MPI program, which delivers some information about its
> > > > * machine, operating system, and some environment variables.
> > > > *
> > > > *
> > > > * Compiling:
> > > > *   Store executable(s) into local directory.
> > > > *     mpicc -o <program name> <source code file name>
> > > > *
> > > > *   Store executable(s) into predefined directories.
> > > > *     make
> > > > *
> > > > *   Make program(s) automatically on all specified hosts. You must
> > > > *   edit the file "make_compile" and specify your host names before
> > > > *   you execute it.
> > > > *     make_compile
> > > > *
> > > > * Running:
> > > > *   LAM-MPI:
> > > > *     mpiexec -boot -np <number of processes> <program name>
> > > > *     or
> > > > *     mpiexec -boot \
> > > > *    -host <hostname> -np <number of processes> <program name> : \
> > > > *    -host <hostname> -np <number of processes> <program name>
> > > > *     or
> > > > *     mpiexec -boot [-v] -configfile <application file>
> > > > *     or
> > > > *     lamboot [-v] [<host file>]
> > > > *       mpiexec -np <number of processes> <program name>
> > > > *    or
> > > > *    mpiexec [-v] -configfile <application file>
> > > > *     lamhalt
> > > > *
> > > > *   OpenMPI:
> > > > *     "host1", "host2", and so on can all have the same name,
> > > > *     if you want to start a virtual computer with some virtual
> > > > *     cpu's on the local host. The name "localhost" is allowed
> > > > *     as well.
> > > > *
> > > > *     mpiexec -np <number of processes> <program name>
> > > > *     or
> > > > *     mpiexec --host <host1,host2,...> \
> > > > *    -np <number of processes> <program name>
> > > > *     or
> > > > *     mpiexec -hostfile <hostfile name> \
> > > > *    -np <number of processes> <program name>
> > > > *     or
> > > > *     mpiexec -app <application file>
> > > > *
> > > > * Cleaning:
> > > > *   local computer:
> > > > *     rm <program name>
> > > > *     or
> > > > *     make clean_all
> > > > *   on all specified computers (you must edit the file "make_clean_all"
> > > > *   and specify your host names before you execute it.
> > > > *     make_clean_all
> > > > *
> > > > *
> > > > * File: environ_mpi.c                       Author: S. Gross
> > > > * Date: 25.09.2012
> > > > *
> > > > */
> > > >
> > > > #include <stdio.h>
> > > > #include <stdlib.h>
> > > > #include <string.h>
> > > > #include <unistd.h>
> > > > #include <sys/utsname.h>
> > > > #include "mpi.h"
> > > >
> > > > #define     BUF_SIZE        8192            /* message buffer size
> > */
> > > > #define     MAX_TASKS       12              /* max. number of tasks
> > */
> > > > #define     SENDTAG         1               /* send message command
> > */
> > > > #define     EXITTAG         2               /* termination command
> > */
> > > > #define     MSGTAG          3               /* normal message token
> > */
> > > >
> > > > #define ENTASKS             -1              /* error: too many tasks
> > */
> > > >
> > > > static void master (void);
> > > > static void slave (void);
> > > >
> > > > int main (int argc, char *argv[])
> > > > {
> > > >  int  mytid,                                /* my task id
> > */
> > > >       ntasks;                               /* number of parallel tasks
> > */
> > > >
> > > >  MPI_Init (&argc, &argv);
> > > >  MPI_Comm_rank (MPI_COMM_WORLD, &mytid);
> > > >  MPI_Comm_size (MPI_COMM_WORLD, &ntasks);
> > > >
> > > >  if (mytid == 0)
> > > >  {
> > > >    master ();
> > > >  }
> > > >  else
> > > >  {
> > > >    slave ();
> > > >  }
> > > >  MPI_Finalize ();
> > > >  return EXIT_SUCCESS;
> > > > }
> > > >
> > > >
> > > > /* Function for the "master task". The master sends a request to all
> > > > * slaves asking for a message. After receiving and printing the
> > > > * messages he sends all slaves a termination command.
> > > > *
> > > > * input parameters: not necessary
> > > > * output parameters:        not available
> > > > * return value:     nothing
> > > > * side effects:     no side effects
> > > > *
> > > > */
> > > > void master (void)
> > > > {
> > > >  int                ntasks,                 /* number of parallel tasks
> > */
> > > >             mytid,                  /* my task id                   */
> > > >             num,                    /* number of entries            */
> > > >             i;                      /* loop variable                */
> > > >  char               buf[BUF_SIZE + 1];      /* message buffer (+1 for
> > '\0')
> > */
> > > >  MPI_Status stat;                   /* message details              */
> > > >
> > > >  MPI_Comm_rank (MPI_COMM_WORLD, &mytid);
> > > >  MPI_Comm_size (MPI_COMM_WORLD, &ntasks);
> > > >  if (ntasks > MAX_TASKS)
> > > >  {
> > > >    fprintf (stderr, "Error: Too many tasks. Try again with at most "
> > > >          "%d tasks.\n", MAX_TASKS);
> > > >    /* terminate all slave tasks                                     */
> > > >    for (i = 1; i < ntasks; ++i)
> > > >    {
> > > >      MPI_Send ((char *) NULL, 0, MPI_CHAR, i, EXITTAG, MPI_COMM_WORLD);
> > > >    }
> > > >    MPI_Finalize ();
> > > >    exit (ENTASKS);
> > > >  }
> > > >  printf ("\n\nNow %d slave tasks are sending their environment.\n\n",
> > > >       ntasks - 1);
> > > >  /* request messages from slave tasks
> > */
> > > >  for (i = 1; i < ntasks; ++i)
> > > >  {
> > > >    MPI_Send ((char *) NULL, 0, MPI_CHAR, i, SENDTAG, MPI_COMM_WORLD);
> > > >  }
> > > >  /* wait for messages and print greetings
> > */
> > > >  for (i = 1; i < ntasks; ++i)
> > > >  {
> > > >    MPI_Recv (buf, BUF_SIZE, MPI_CHAR, MPI_ANY_SOURCE,
> > > >           MPI_ANY_TAG, MPI_COMM_WORLD, &stat);
> > > >    MPI_Get_count (&stat, MPI_CHAR, &num);
> > > >    buf[num] = '\0';                 /* add missing end-of-string    */
> > > >    printf ("Environment from task %d:\n"
> > > >         "  message type:        %d\n"
> > > >         "  msg length:          %d characters\n"
> > > >         "  message:             %s\n\n",
> > > >         stat.MPI_SOURCE, stat.MPI_TAG, num, buf);
> > > >  }
> > > >  /* terminate all slave tasks
> > */
> > > >  for (i = 1; i < ntasks; ++i)
> > > >  {
> > > >    MPI_Send ((char *) NULL, 0, MPI_CHAR, i, EXITTAG, MPI_COMM_WORLD);
> > > >  }
> > > > }
> > > >
> > > >
> > > > /* Function for "slave tasks". The slave task sends its hostname,
> > > > * operating system name and release, and processor architecture
> > > > * as a message to the master.
> > > > *
> > > > * input parameters: not necessary
> > > > * output parameters:        not available
> > > > * return value:     nothing
> > > > * side effects:     no side effects
> > > > *
> > > > */
> > > > void slave (void)
> > > > {
> > > >  struct utsname sys_info;           /* system information           */
> > > >  int                 mytid,                 /* my task id
> > */
> > > >              num_env_vars,          /* # of environment variables   */
> > > >              i,                     /* loop variable                */
> > > >              more_to_do;
> > > >  char                buf[BUF_SIZE],         /* message buffer
> > */
> > > >              *env_vars[] = {"PATH",
> > > >                             "LD_LIBRARY_PATH_32",
> > > >                             "LD_LIBRARY_PATH_64",
> > > >                             "LD_LIBRARY_PATH",
> > > >                             "CLASSPATH"};
> > > >  MPI_Status  stat;                  /* message details              */
> > > >
> > > >  MPI_Comm_rank (MPI_COMM_WORLD, &mytid);
> > > >  num_env_vars = sizeof (env_vars) / sizeof (env_vars[0]);
> > > >  more_to_do = 1;
> > > >  while (more_to_do == 1)
> > > >  {
> > > >    /* wait for a message from the master task
> > */
> > > >    MPI_Recv (buf, BUF_SIZE, MPI_CHAR, 0, MPI_ANY_TAG,
> > > >           MPI_COMM_WORLD, &stat);
> > > >    if (stat.MPI_TAG != EXITTAG)
> > > >    {
> > > >      uname (&sys_info);
> > > >      strcpy (buf, "\n    hostname:          ");
> > > >      strncpy (buf + strlen (buf), sys_info.nodename,
> > > >            BUF_SIZE - strlen (buf));
> > > >      strncpy (buf + strlen (buf), "\n    operating system:  ",
> > > >            BUF_SIZE - strlen (buf));
> > > >      strncpy (buf + strlen (buf), sys_info.sysname,
> > > >            BUF_SIZE - strlen (buf));
> > > >      strncpy (buf + strlen (buf), "\n    release:           ",
> > > >            BUF_SIZE - strlen (buf));
> > > >      strncpy (buf + strlen (buf), sys_info.release,
> > > >            BUF_SIZE - strlen (buf));
> > > >      strncpy (buf + strlen (buf), "\n    processor:         ",
> > > >            BUF_SIZE - strlen (buf));
> > > >      strncpy (buf + strlen (buf), sys_info.machine,
> > > >            BUF_SIZE - strlen (buf));
> > > >      for (i = 0; i < num_env_vars; ++i)
> > > >      {
> > > >     char *env_val,                  /* pointer to environment value */
> > > >          *delimiter = ":"   ,       /* field delimiter for "strtok" */
> > > >          *next_tok;                 /* next token                   */
> > > >
> > > >     env_val = getenv (env_vars[i]);
> > > >     if (env_val != NULL)
> > > >     {
> > > >       if ((strlen (buf) + strlen (env_vars[i]) + 6) < BUF_SIZE)
> > > >       {
> > > >         strncpy (buf + strlen (buf), "\n    ",
> > > >                  BUF_SIZE - strlen (buf));
> > > >         strncpy (buf + strlen (buf), env_vars[i],
> > > >                  BUF_SIZE - strlen (buf));
> > > >       }
> > > >       else
> > > >       {
> > > >         fprintf (stderr, "Buffer too small. Couldn't add \"%s\"."
> > > >                  "\n\n", env_vars[i]);
> > > >       }
> > > >       /* Get first token in "env_val". "strtok" skips all
> > > >        * characters that are contained in the current delimiter
> > > >        * string. If it finds a character which is not contained
> > > >        * in the delimiter string, it is the start of the first
> > > >        * token. Now it searches for the next character which is
> > > >        * part of the delimiter string. If it finds one it will
> > > >        * overwrite it by a '\0' to terminate the first token.
> > > >        * Otherwise the token extends to the end of the string.
> > > >        * Subsequent calls of "strtok" use a NULL pointer as first
> > > >        * argument and start searching from the saved position
> > > >        * after the last token. "strtok" returns NULL if it
> > > >        * couldn't find a token.
> > > >        */
> > > >       next_tok = strtok (env_val, delimiter);
> > > >       while (next_tok != NULL)
> > > >       {
> > > >         if ((strlen (buf) + strlen (next_tok) + 25) < BUF_SIZE)
> > > >         {
> > > >           strncpy (buf + strlen (buf), "\n                       ",
> > > >                    BUF_SIZE - strlen (buf));
> > > >           strncpy (buf + strlen (buf), next_tok,
> > > >                    BUF_SIZE - strlen (buf));
> > > >         }
> > > >         else
> > > >         {
> > > >           fprintf (stderr, "Buffer too small. Couldn't add \"%s\" "
> > > >                    "to %s.\n\n", next_tok, env_vars[i]);
> > > >         }
> > > >         /* get next token                                           */
> > > >         next_tok = strtok (NULL, delimiter);
> > > >       }
> > > >     }
> > > >      }
> > > >      MPI_Send (buf, strlen (buf), MPI_CHAR, stat.MPI_SOURCE,
> > > >                     MSGTAG, MPI_COMM_WORLD);
> > > >    }
> > > >    else
> > > >    {
> > > >      more_to_do = 0;                        /* terminate
> > */
> > > >    }
> > > >  }
> > > > }
> > > > /* A small program, which delivers some information about its
> > > > * machine, operating system, and some environment variables.
> > > > *
> > > > *
> > > > * Compiling:
> > > > *   Store executable(s) into local directory.
> > > > *     (g)cc -o environ_without_mpi environ_without_mpi.c
> > > > *
> > > > * Running:
> > > > *   environ_without_mpi
> > > > *
> > > > *
> > > > * File: environ_without_mpi.c               Author: S. Gross
> > > > * Date: 25.09.2012
> > > > *
> > > > */
> > > >
> > > > #include <stdio.h>
> > > > #include <stdlib.h>
> > > > #include <string.h>
> > > > #include <unistd.h>
> > > > #include <sys/utsname.h>
> > > >
> > > > #define     BUF_SIZE        8192            /* message buffer size
> > */
> > > >
> > > > int main (int argc, char *argv[])
> > > > {
> > > >  struct utsname sys_info;           /* system information           */
> > > >  int                 num_env_vars,          /* # of environment
> > variables
> > */
> > > >              i;                     /* loop variable                */
> > > >  char                buf[BUF_SIZE],         /* message buffer
> > */
> > > >              *env_vars[] = {"PATH",
> > > >                             "LD_LIBRARY_PATH_32",
> > > >                             "LD_LIBRARY_PATH_64",
> > > >                             "LD_LIBRARY_PATH",
> > > >                             "CLASSPATH"};
> > > >
> > > >  num_env_vars = sizeof (env_vars) / sizeof (env_vars[0]);
> > > >  uname (&sys_info);
> > > >  strcpy (buf, "\n    hostname:          ");
> > > >  strncpy (buf + strlen (buf), sys_info.nodename,
> > > >        BUF_SIZE - strlen (buf));
> > > >  strncpy (buf + strlen (buf), "\n    operating system:  ",
> > > >        BUF_SIZE - strlen (buf));
> > > >  strncpy (buf + strlen (buf), sys_info.sysname,
> > > >        BUF_SIZE - strlen (buf));
> > > >  strncpy (buf + strlen (buf), "\n    release:           ",
> > > >        BUF_SIZE - strlen (buf));
> > > >  strncpy (buf + strlen (buf), sys_info.release,
> > > >        BUF_SIZE - strlen (buf));
> > > >  strncpy (buf + strlen (buf), "\n    processor:         ",
> > > >        BUF_SIZE - strlen (buf));
> > > >  strncpy (buf + strlen (buf), sys_info.machine,
> > > >        BUF_SIZE - strlen (buf));
> > > >  for (i = 0; i < num_env_vars; ++i)
> > > >  {
> > > >    char *env_val,                   /* pointer to environment value */
> > > >      *delimiter = ":"       ,       /* field delimiter for "strtok" */
> > > >      *next_tok;                     /* next token                   */
> > > >
> > > >    env_val = getenv (env_vars[i]);
> > > >    if (env_val != NULL)
> > > >    {
> > > >      if ((strlen (buf) + strlen (env_vars[i]) + 6) < BUF_SIZE)
> > > >      {
> > > >     strncpy (buf + strlen (buf), "\n    ",
> > > >              BUF_SIZE - strlen (buf));
> > > >     strncpy (buf + strlen (buf), env_vars[i],
> > > >              BUF_SIZE - strlen (buf));
> > > >      }
> > > >      else
> > > >      {
> > > >     fprintf (stderr, "Buffer too small. Couldn't add \"%s\"."
> > > >              "\n\n", env_vars[i]);
> > > >      }
> > > >      /* Get first token in "env_val". "strtok" skips all
> > > >       * characters that are contained in the current delimiter
> > > >       * string. If it finds a character which is not contained
> > > >       * in the delimiter string, it is the start of the first
> > > >       * token. Now it searches for the next character which is
> > > >       * part of the delimiter string. If it finds one it will
> > > >       * overwrite it by a '\0' to terminate the first token.
> > > >       * Otherwise the token extends to the end of the string.
> > > >       * Subsequent calls of "strtok" use a NULL pointer as first
> > > >       * argument and start searching from the saved position
> > > >       * after the last token. "strtok" returns NULL if it
> > > >       * couldn't find a token.
> > > >       */
> > > >      next_tok = strtok (env_val, delimiter);
> > > >      while (next_tok != NULL)
> > > >      {
> > > >     if ((strlen (buf) + strlen (next_tok) + 25) < BUF_SIZE)
> > > >     {
> > > >       strncpy (buf + strlen (buf), "\n                       ",
> > > >                BUF_SIZE - strlen (buf));
> > > >       strncpy (buf + strlen (buf), next_tok,
> > > >                BUF_SIZE - strlen (buf));
> > > >     }
> > > >     else
> > > >     {
> > > >       fprintf (stderr, "Buffer too small. Couldn't add \"%s\" "
> > > >                "to %s.\n\n", next_tok, env_vars[i]);
> > > >     }
> > > >     /* get next token                                               */
> > > >     next_tok = strtok (NULL, delimiter);
> > > >      }
> > > >    }
> > > >  }
> > > >  printf ("Environment:\n"
> > > >       "  message:             %s\n\n",  buf);
> > > >  return EXIT_SUCCESS;
> > > > }
> > >
> > >
> >
> >

Reply via email to