I'm not sure I understand your solution -- it sounds like you are overriding 
$HOME for each process...?  If so, that's playing with fire.

Is there a reason you can't set PATH / LD_LIBRARY_PATH in your ssh wrapper 
script to point to the Open MPI installation that you want to use on each node?

To answer your question: yes, "rsh agent" MCA param has changed over time.  
It's been plm_rsh_agent for a while, though.  I don't remember exactly when it 
changed, but it's been that way since at least v1.8.0.


> On Nov 23, 2016, at 5:04 PM, Jason Patton <jpat...@cs.wisc.edu> wrote:
> 
> I think I may have solved this, in case anyone is curious or wants to
> yell about how terrible it is :). In the ssh wrapper script, when
> ssh-ing, before launching orted:
> 
> export HOME=${your_working_directory} \;
> 
> (If $HOME means something for you jobs, then maybe this isn't a good 
> solution.)
> 
> Got this from connecting some dots from the man page:
> 
> Under Current Working Directory (emphasis added):
> 
> "If the -wdir option is not specified, Open MPI will send the
> directory name where mpirun was invoked to each of the remote nodes.
> The remote nodes will try to change to that directory. If they are
> unable (e.g., if the directory does not exist on that node), then
> **Open MPI will use the default directory determined by the
> starter**."
> 
> In this case the starter is ssh; under Locating Files:
> 
> "For example when using the rsh or ssh starters, **the initial
> directory is $HOME by default**."
> 
> Hope this helps someone!
> 
> Jason Patton
> 
> On Wed, Nov 23, 2016 at 1:43 PM, Jason Patton <jpat...@cs.wisc.edu> wrote:
>> I would like to mpirun across nodes that do not share a filesystem and
>> might have the executable in different directories. For example, node0
>> has the executable at /tmp/job42/mpitest and node1 has it at
>> /tmp/job100/mpitest.
>> 
>> If you can grant me that I have a ssh wrapper script (that gets set as
>> the orte/plm_rsh_agent**) that cds to where the executable lies on
>> each worker node before launching orted, is there a way to tell the
>> worker node orted processes to run the executable from the current
>> working directory rather than from the absolute path that (I presume)
>> the head node process advertises? I've tried adding/changing
>> orte_remote_tmpdir_base per each worker orted process, but then I get
>> an error about having both global_tmpdir and remote_tmpdir set. Then
>> if I set local_tmpdir to match the head node, I'm back at square one.
>> 
>> I know this sounds fairly convoluted, but I'm updating helper scripts
>> for HTCondor so that its parallel universe can work with newer MPI
>> versions (dealing with similar headaches trying to get hydra to
>> cooperate). The default behavior is for condor to place each "job"
>> (i.e. sshd+orted process) in a sandbox, and we cannot know the name of
>> the sandbox directories ahead of time or assume that they will have
>> the same name across nodes. The easiest way to deal with this is if we
>> can assume the executable lies on a shared fs, but the fewer
>> assumptions from our POV the better. (Even better would be if someone
>> /really/ wants to build in condor support like has been done for other
>> launchers; that's beyond me right now.)
>> 
>> **Also, what is the correct parameter to set to rsh_agent? ompi_info
>> (and mpirun) says orte_rsh_agent is deprecated, but online docs seem
>> to suggest that plm_rsh_agent is deprecated. I'm using version 1.8.1.
>> 
>> Thanks for any insight you can provide
>> 
>> Jason Patton
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to