Thanks, Jeff, for the details!

On Sat, Sep 24, 2011 at 07:26:49AM -0400, Jeff Squyres wrote:
> On Sep 22, 2011, at 11:06 PM, Martin Siegert wrote:
> 
> > I am trying to figure out how openmpi (1.4.3) sets its PATH
> > for executables. From the man page:
> > 
> > Locating Files
> >    If no relative or absolute path is specified for a file, Open MPI  will
> >    first  look  for  files  by  searching the directories specified by the
> >    --path option.  If there is no --path option set or if the file is  not
> >    found at the --path location, then Open MPI will search the user’s PATH
> >    environment variable as defined on the source node(s).
> 
> Oops -- it's not the source node, it's the running node.  That being said, 
> sometimes they're the same thing, and sometimes the PATH is copied (by the 
> underlying run-time environment) to the target node.
> 
> > This does not appear to be entirely correct - as far as I can tell
> > openmpi always prepends its own bin directory to the PATH before
> > searching for the executable. Can that be switched off?
> 
> It should not be doing that unless you are specifying the full path name to 
> mpirun, or using the --prefix option.

By now I recognize that my tests where flawed in in several aspects:
1) the path settings depend on whether you specify the full path to
mpiexec (as you mention), i.e., "/usr/local/openmpi/bin/mpiexec" does things
differently than "mpiexec" even though the executable is the same.
2) it makes a difference whether mpiexec runs from a torque batch
job or interactively (as you say below as well).

Nevertheless, I cannot avoid mpiexec prepending its own directory to the
PATH. This is what I tried:

dev:~> echo $PATH
/usr/local/bin:/usr/local/openmpi/bin:/usr/local/moab/bin:/usr/local/torque/bin:/bin:/usr/bin:/home/siegert/bin
# this is the default PATH on every node
dev:~> cat /home/siegert/scratch/test/path-0.0.1/bin/path.sh
#!/bin/bash
echo "`hostname`, $0:"
echo $PATH
dev:~> cat path.pbs
#!/bin/bash
#PBS -N path
#PBS -l walltime=1:00
#PBS -l nodes=2:ppn=1

export PATH=/home/siegert/scratch/test/path-0.0.1/bin:$PATH
echo $PATH
mpiexec path.sh
dev:~> qsub path.pbs
14.dev
dev:~> cat path.o14
/home/siegert/scratch/test/path-0.0.1/bin:/usr/local/bin:/usr/local/openmpi/bin:/usr/local/moab/bin:/usr/local/torque/bin:/bin:/usr/bin:/home/siegert/bin:/home/siegert/bin
b414, /home/siegert/scratch/test/path-0.0.1/bin/path.sh:
/usr/local/openmpi/bin:/usr/local/openmpi/bin:/home/siegert/scratch/test/path-0.0.1/bin:/usr/local/bin:/usr/local/openmpi/bin:/usr/local/moab/bin:/usr/local/torque/bin:/bin:/usr/bin:/home/siegert/bin:/home/siegert/bin
b413, /home/siegert/scratch/test/path-0.0.1/bin/path.sh:
/usr/local/openmpi/bin:/usr/local/openmpi/bin:/usr/local/openmpi/bin:/home/siegert/scratch/test/path-0.0.1/bin:/usr/local/bin:/usr/local/openmpi/bin:/usr/local/moab/bin:/usr/local/torque/bin:/bin:/usr/bin:/home/siegert/bin:/home/siegert/bin

Thus, on the local node (where mpiexec is run) /usr/local/openmpi/bin is
prepended twice, on the remote node /usr/local/openmpi/bin is prepended
three times.
But, this is the first point where I tricked myself: our "mpiexec" is a
wrapper script (in /usr/local/bin) that calls /usr/local/openmpi/bin/mpiexec:
dev:~> which mpiexec
/usr/local/bin/mpiexec
dev:~> which orterun
/usr/local/openmpi/bin/orterun

But, when I replace "mpiexec" in path.pbs with "orterun" the following
happens:

dev:~> cat path.pbs
#!/bin/bash
#PBS -N path
#PBS -l walltime=1:00
#PBS -l nodes=2:ppn=1

export PATH=/home/siegert/scratch/test/path-0.0.1/bin:$PATH
echo $PATH
orterun path.sh
dev:~> qsub path.pbs
15.dev
dev:~> cat path.o15
/home/siegert/scratch/test/path-0.0.1/bin:/usr/local/bin:/usr/local/openmpi/bin:/usr/local/moab/bin:/usr/local/torque/bin:/bin:/usr/bin:/home/siegert/bin:/home/siegert/bin
b414, /home/siegert/scratch/test/path-0.0.1/bin/path.sh:
/usr/local/openmpi-1.4.3/bin:/usr/local/openmpi-1.4.3/bin:/home/siegert/scratch/test/path-0.0.1/bin:/usr/local/bin:/usr/local/openmpi/bin:/usr/local/moab/bin:/usr/local/torque/bin:/bin:/usr/bin:/home/siegert/bin:/home/siegert/bin
b413, /home/siegert/scratch/test/path-0.0.1/bin/path.sh:
/usr/local/openmpi-1.4.3/bin:/usr/local/openmpi-1.4.3/bin:/usr/local/openmpi-1.4.3/bin:/home/siegert/scratch/test/path-0.0.1/bin:/usr/local/bin:/usr/local/openmpi/bin:/usr/local/moab/bin:/usr/local/torque/bin:/bin:/usr/bin:/home/siegert/bin:/home/siegert/bin

It appears that now "orterun" does something like "readlink -f $0":
/usr/local/openmpi is actually a softlink to /usr/local/openmpi-1.4.3.
Anyway, again the directory where the orterun executable is located
gets prepended twice on the local and three times on the remote node.
Only adding the --noprefix option to orterun avoids the prepending
of the directory (when calling "/usr/local/openmpi/bin/mpiexec --noprefix"
the --noprefix flag has no effect).

I guess, I could achieve what I want by using "orterun --noprefix" from the
wrapper script.

> > Furthermore, openmpi appears to use
> > a) the current value of PATH on the node where mpiexec is running;
> > b) whatever PATH is used by ssh on the remote nodes.
> 
> mpirun uses the $PATH local to where it is.  We don't ship the PATH to the 
> remote node unless you tell mpirun to via the -x PATH option (as you noted 
> below).  We've found that default shipping the PATH to remote nodes can cause 
> unexpected problems.
> 
> That being said, some run-time systems (e.g., SLURM, Torque) automatically 
> ship the front-end PATH to the back-end machine(s) for you.  Open MPI just 
> "inherits" this PATH on the remote node, so to speak.  ssh doesn't do this by 
> default.

Yup. That was the other way I tricked myself: trying to debug a bahaviour
when running under torque by running mpiexec interactively from the
head node. When "path.sh" is run interactively it fails because it is not
found on the remote node.

> Here's an example with 1.4.3 running SLURM on my test cluster at Cisco.  This 
> is in an SLURM allocation; I am running on the head node.  Note that I'm a 
> tcsh user, so I use "echo $path", not "echo $PATH":
> 
> -----
> [4:23] svbu-mpi:~ % hostname
> svbu-mpi.cisco.com
> # Note my original path
> [4:23] svbu-mpi:~ % echo $path
> /users/jsquyres/local/rhel5/bin /home/jsquyres/bogus/bin 
> /users/jsquyres/local/bin /usr/local/bin /users/jsquyres/local/rhel5/bin 
> /home/jsquyres/bogus/bin /users/jsquyres/local/bin /usr/local/bin 
> /usr/kerberos/bin /usr/local/bin /bin /usr/bin /usr/X11R6/bin 
> /opt/slurm/2.1.0/bin /data/home/ted/bin /data/home/ted/bin
> # Since I'm in a SLURM allocation, mpirun sends jobs to a remote node
> [4:23] svbu-mpi:~ % mpirun -np 1 hostname
> svbu-mpi020
> # Here's my test script
> [4:23] svbu-mpi:~ % cat foo.csh
> #!/bin/tcsh -f
> echo $path
> # When I run this script through mpirun, the $path is the same 
> # as was displayed above
> [4:23] svbu-mpi:~ % mpirun -np 1 foo.csh
> /users/jsquyres/local/rhel5/bin /home/jsquyres/bogus/bin 
> /users/jsquyres/local/bin /usr/local/bin /users/jsquyres/local/rhel5/bin 
> /home/jsquyres/bogus/bin /users/jsquyres/local/bin /usr/local/bin 
> /usr/kerberos/bin /usr/local/bin /bin /usr/bin /usr/X11R6/bin 
> /opt/slurm/2.1.0/bin /data/home/ted/bin /data/home/ted/bin
> # Now if I use the full path name to mpirun, I get an extra bonus
> # directory in the front of my $path -- the location of where
> # mpirun is located.
> [4:23] svbu-mpi:~ % /home/jsquyres/bogus/bin/mpirun -np 1 foo.csh
> /home/jsquyres/bogus/bin /home/jsquyres/bogus/bin 
> /users/jsquyres/local/rhel5/bin /home/jsquyres/bogus/bin 
> /users/jsquyres/local/bin /usr/local/bin /users/jsquyres/local/rhel5/bin 
> /home/jsquyres/bogus/bin /users/jsquyres/local/bin /usr/local/bin 
> /usr/kerberos/bin /usr/local/bin /bin /usr/bin /usr/X11R6/bin 
> /opt/slurm/2.1.0/bin /data/home/ted/bin /data/home/ted/bin
> [4:23] svbu-mpi:~ % 
> -----
> 
> > Thus,
> > 
> > export PATH=/path/to/special/bin:$PATH
> > mpiexec -n 2 -H n1,n2 special
> > 
> > (n1 being the local node)
> > will usually fail even if the directory structure is identical on
> > the two nodes. For this to work
> 
> The PATH you set will be available on n1, but it depends on the underlying 
> run-time launcher if it is available on n2.  ssh will not copy your PATH to 
> n2 by default, but others will (e.g., SLURM).
> 
> > mpiexec -n 2 -H n1,n2 -x PATH special
> 
> That will work for ssh in this case, yes.
> 
> > What I would like to see is a configure option that allows me to configure
> > openmpi such that the current PATH on the node where mpiexec is running
> > is used as the PATH on all nodes (by default). Or is there a reason why
> > that is a really bad idea?
> 
> If your nodes are not exactly the same, this can lead to all kinds of 
> badness.  That's why we didn't do it by default.

I totally understand that you do not want to do this by default.
However, it would be nice to have a configure option like
--disable-prepend-ompi-path
that would at least prevent the prepending of the openmpi bin directory.
For those of us who do have identical nodes it would be even nicer to
have a configure option
--enable-path-propagation
that would always do -x PATH and not prepend the openmpi bin directory.

Cheers,
Martin

-- 
Martin Siegert
Simon Fraser University
Burnaby, British Columbia

Reply via email to