Thanks, Jeff, for the details! On Sat, Sep 24, 2011 at 07:26:49AM -0400, Jeff Squyres wrote: > On Sep 22, 2011, at 11:06 PM, Martin Siegert wrote: > > > I am trying to figure out how openmpi (1.4.3) sets its PATH > > for executables. From the man page: > > > > Locating Files > > If no relative or absolute path is specified for a file, Open MPI will > > first look for files by searching the directories specified by the > > --path option. If there is no --path option set or if the file is not > > found at the --path location, then Open MPI will search the user’s PATH > > environment variable as defined on the source node(s). > > Oops -- it's not the source node, it's the running node. That being said, > sometimes they're the same thing, and sometimes the PATH is copied (by the > underlying run-time environment) to the target node. > > > This does not appear to be entirely correct - as far as I can tell > > openmpi always prepends its own bin directory to the PATH before > > searching for the executable. Can that be switched off? > > It should not be doing that unless you are specifying the full path name to > mpirun, or using the --prefix option.
By now I recognize that my tests where flawed in in several aspects: 1) the path settings depend on whether you specify the full path to mpiexec (as you mention), i.e., "/usr/local/openmpi/bin/mpiexec" does things differently than "mpiexec" even though the executable is the same. 2) it makes a difference whether mpiexec runs from a torque batch job or interactively (as you say below as well). Nevertheless, I cannot avoid mpiexec prepending its own directory to the PATH. This is what I tried: dev:~> echo $PATH /usr/local/bin:/usr/local/openmpi/bin:/usr/local/moab/bin:/usr/local/torque/bin:/bin:/usr/bin:/home/siegert/bin # this is the default PATH on every node dev:~> cat /home/siegert/scratch/test/path-0.0.1/bin/path.sh #!/bin/bash echo "`hostname`, $0:" echo $PATH dev:~> cat path.pbs #!/bin/bash #PBS -N path #PBS -l walltime=1:00 #PBS -l nodes=2:ppn=1 export PATH=/home/siegert/scratch/test/path-0.0.1/bin:$PATH echo $PATH mpiexec path.sh dev:~> qsub path.pbs 14.dev dev:~> cat path.o14 /home/siegert/scratch/test/path-0.0.1/bin:/usr/local/bin:/usr/local/openmpi/bin:/usr/local/moab/bin:/usr/local/torque/bin:/bin:/usr/bin:/home/siegert/bin:/home/siegert/bin b414, /home/siegert/scratch/test/path-0.0.1/bin/path.sh: /usr/local/openmpi/bin:/usr/local/openmpi/bin:/home/siegert/scratch/test/path-0.0.1/bin:/usr/local/bin:/usr/local/openmpi/bin:/usr/local/moab/bin:/usr/local/torque/bin:/bin:/usr/bin:/home/siegert/bin:/home/siegert/bin b413, /home/siegert/scratch/test/path-0.0.1/bin/path.sh: /usr/local/openmpi/bin:/usr/local/openmpi/bin:/usr/local/openmpi/bin:/home/siegert/scratch/test/path-0.0.1/bin:/usr/local/bin:/usr/local/openmpi/bin:/usr/local/moab/bin:/usr/local/torque/bin:/bin:/usr/bin:/home/siegert/bin:/home/siegert/bin Thus, on the local node (where mpiexec is run) /usr/local/openmpi/bin is prepended twice, on the remote node /usr/local/openmpi/bin is prepended three times. But, this is the first point where I tricked myself: our "mpiexec" is a wrapper script (in /usr/local/bin) that calls /usr/local/openmpi/bin/mpiexec: dev:~> which mpiexec /usr/local/bin/mpiexec dev:~> which orterun /usr/local/openmpi/bin/orterun But, when I replace "mpiexec" in path.pbs with "orterun" the following happens: dev:~> cat path.pbs #!/bin/bash #PBS -N path #PBS -l walltime=1:00 #PBS -l nodes=2:ppn=1 export PATH=/home/siegert/scratch/test/path-0.0.1/bin:$PATH echo $PATH orterun path.sh dev:~> qsub path.pbs 15.dev dev:~> cat path.o15 /home/siegert/scratch/test/path-0.0.1/bin:/usr/local/bin:/usr/local/openmpi/bin:/usr/local/moab/bin:/usr/local/torque/bin:/bin:/usr/bin:/home/siegert/bin:/home/siegert/bin b414, /home/siegert/scratch/test/path-0.0.1/bin/path.sh: /usr/local/openmpi-1.4.3/bin:/usr/local/openmpi-1.4.3/bin:/home/siegert/scratch/test/path-0.0.1/bin:/usr/local/bin:/usr/local/openmpi/bin:/usr/local/moab/bin:/usr/local/torque/bin:/bin:/usr/bin:/home/siegert/bin:/home/siegert/bin b413, /home/siegert/scratch/test/path-0.0.1/bin/path.sh: /usr/local/openmpi-1.4.3/bin:/usr/local/openmpi-1.4.3/bin:/usr/local/openmpi-1.4.3/bin:/home/siegert/scratch/test/path-0.0.1/bin:/usr/local/bin:/usr/local/openmpi/bin:/usr/local/moab/bin:/usr/local/torque/bin:/bin:/usr/bin:/home/siegert/bin:/home/siegert/bin It appears that now "orterun" does something like "readlink -f $0": /usr/local/openmpi is actually a softlink to /usr/local/openmpi-1.4.3. Anyway, again the directory where the orterun executable is located gets prepended twice on the local and three times on the remote node. Only adding the --noprefix option to orterun avoids the prepending of the directory (when calling "/usr/local/openmpi/bin/mpiexec --noprefix" the --noprefix flag has no effect). I guess, I could achieve what I want by using "orterun --noprefix" from the wrapper script. > > Furthermore, openmpi appears to use > > a) the current value of PATH on the node where mpiexec is running; > > b) whatever PATH is used by ssh on the remote nodes. > > mpirun uses the $PATH local to where it is. We don't ship the PATH to the > remote node unless you tell mpirun to via the -x PATH option (as you noted > below). We've found that default shipping the PATH to remote nodes can cause > unexpected problems. > > That being said, some run-time systems (e.g., SLURM, Torque) automatically > ship the front-end PATH to the back-end machine(s) for you. Open MPI just > "inherits" this PATH on the remote node, so to speak. ssh doesn't do this by > default. Yup. That was the other way I tricked myself: trying to debug a bahaviour when running under torque by running mpiexec interactively from the head node. When "path.sh" is run interactively it fails because it is not found on the remote node. > Here's an example with 1.4.3 running SLURM on my test cluster at Cisco. This > is in an SLURM allocation; I am running on the head node. Note that I'm a > tcsh user, so I use "echo $path", not "echo $PATH": > > ----- > [4:23] svbu-mpi:~ % hostname > svbu-mpi.cisco.com > # Note my original path > [4:23] svbu-mpi:~ % echo $path > /users/jsquyres/local/rhel5/bin /home/jsquyres/bogus/bin > /users/jsquyres/local/bin /usr/local/bin /users/jsquyres/local/rhel5/bin > /home/jsquyres/bogus/bin /users/jsquyres/local/bin /usr/local/bin > /usr/kerberos/bin /usr/local/bin /bin /usr/bin /usr/X11R6/bin > /opt/slurm/2.1.0/bin /data/home/ted/bin /data/home/ted/bin > # Since I'm in a SLURM allocation, mpirun sends jobs to a remote node > [4:23] svbu-mpi:~ % mpirun -np 1 hostname > svbu-mpi020 > # Here's my test script > [4:23] svbu-mpi:~ % cat foo.csh > #!/bin/tcsh -f > echo $path > # When I run this script through mpirun, the $path is the same > # as was displayed above > [4:23] svbu-mpi:~ % mpirun -np 1 foo.csh > /users/jsquyres/local/rhel5/bin /home/jsquyres/bogus/bin > /users/jsquyres/local/bin /usr/local/bin /users/jsquyres/local/rhel5/bin > /home/jsquyres/bogus/bin /users/jsquyres/local/bin /usr/local/bin > /usr/kerberos/bin /usr/local/bin /bin /usr/bin /usr/X11R6/bin > /opt/slurm/2.1.0/bin /data/home/ted/bin /data/home/ted/bin > # Now if I use the full path name to mpirun, I get an extra bonus > # directory in the front of my $path -- the location of where > # mpirun is located. > [4:23] svbu-mpi:~ % /home/jsquyres/bogus/bin/mpirun -np 1 foo.csh > /home/jsquyres/bogus/bin /home/jsquyres/bogus/bin > /users/jsquyres/local/rhel5/bin /home/jsquyres/bogus/bin > /users/jsquyres/local/bin /usr/local/bin /users/jsquyres/local/rhel5/bin > /home/jsquyres/bogus/bin /users/jsquyres/local/bin /usr/local/bin > /usr/kerberos/bin /usr/local/bin /bin /usr/bin /usr/X11R6/bin > /opt/slurm/2.1.0/bin /data/home/ted/bin /data/home/ted/bin > [4:23] svbu-mpi:~ % > ----- > > > Thus, > > > > export PATH=/path/to/special/bin:$PATH > > mpiexec -n 2 -H n1,n2 special > > > > (n1 being the local node) > > will usually fail even if the directory structure is identical on > > the two nodes. For this to work > > The PATH you set will be available on n1, but it depends on the underlying > run-time launcher if it is available on n2. ssh will not copy your PATH to > n2 by default, but others will (e.g., SLURM). > > > mpiexec -n 2 -H n1,n2 -x PATH special > > That will work for ssh in this case, yes. > > > What I would like to see is a configure option that allows me to configure > > openmpi such that the current PATH on the node where mpiexec is running > > is used as the PATH on all nodes (by default). Or is there a reason why > > that is a really bad idea? > > If your nodes are not exactly the same, this can lead to all kinds of > badness. That's why we didn't do it by default. I totally understand that you do not want to do this by default. However, it would be nice to have a configure option like --disable-prepend-ompi-path that would at least prevent the prepending of the openmpi bin directory. For those of us who do have identical nodes it would be even nicer to have a configure option --enable-path-propagation that would always do -x PATH and not prepend the openmpi bin directory. Cheers, Martin -- Martin Siegert Simon Fraser University Burnaby, British Columbia