Re: [OMPI users] non-shared fs, executable in different directories

Jason Patton Wed, 23 Nov 2016 14:06:31 -0800

I think I may have solved this, in case anyone is curious or wants to
yell about how terrible it is :). In the ssh wrapper script, when
ssh-ing, before launching orted:


export HOME=${your_working_directory} \;

(If $HOME means something for you jobs, then maybe this isn't a good solution.)

Got this from connecting some dots from the man page:

Under Current Working Directory (emphasis added):

"If the -wdir option is not specified, Open MPI will send the
directory name where mpirun was invoked to each of the remote nodes.
The remote nodes will try to change to that directory. If they are
unable (e.g., if the directory does not exist on that node), then
**Open MPI will use the default directory determined by the
starter**."

In this case the starter is ssh; under Locating Files:

"For example when using the rsh or ssh starters, **the initial
directory is $HOME by default**."

Hope this helps someone!

Jason Patton

On Wed, Nov 23, 2016 at 1:43 PM, Jason Patton <jpat...@cs.wisc.edu> wrote:
> I would like to mpirun across nodes that do not share a filesystem and
> might have the executable in different directories. For example, node0
> has the executable at /tmp/job42/mpitest and node1 has it at
> /tmp/job100/mpitest.
>
> If you can grant me that I have a ssh wrapper script (that gets set as
> the orte/plm_rsh_agent**) that cds to where the executable lies on
> each worker node before launching orted, is there a way to tell the
> worker node orted processes to run the executable from the current
> working directory rather than from the absolute path that (I presume)
> the head node process advertises? I've tried adding/changing
> orte_remote_tmpdir_base per each worker orted process, but then I get
> an error about having both global_tmpdir and remote_tmpdir set. Then
> if I set local_tmpdir to match the head node, I'm back at square one.
>
> I know this sounds fairly convoluted, but I'm updating helper scripts
> for HTCondor so that its parallel universe can work with newer MPI
> versions (dealing with similar headaches trying to get hydra to
> cooperate). The default behavior is for condor to place each "job"
> (i.e. sshd+orted process) in a sandbox, and we cannot know the name of
> the sandbox directories ahead of time or assume that they will have
> the same name across nodes. The easiest way to deal with this is if we
> can assume the executable lies on a shared fs, but the fewer
> assumptions from our POV the better. (Even better would be if someone
> /really/ wants to build in condor support like has been done for other
> launchers; that's beyond me right now.)
>
> **Also, what is the correct parameter to set to rsh_agent? ompi_info
> (and mpirun) says orte_rsh_agent is deprecated, but online docs seem
> to suggest that plm_rsh_agent is deprecated. I'm using version 1.8.1.
>
> Thanks for any insight you can provide
>
> Jason Patton
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] non-shared fs, executable in different directories

Reply via email to