I'm not sure I understand your solution -- it sounds like you are overriding $HOME for each process...? If so, that's playing with fire.
Is there a reason you can't set PATH / LD_LIBRARY_PATH in your ssh wrapper script to point to the Open MPI installation that you want to use on each node? To answer your question: yes, "rsh agent" MCA param has changed over time. It's been plm_rsh_agent for a while, though. I don't remember exactly when it changed, but it's been that way since at least v1.8.0. > On Nov 23, 2016, at 5:04 PM, Jason Patton <jpat...@cs.wisc.edu> wrote: > > I think I may have solved this, in case anyone is curious or wants to > yell about how terrible it is :). In the ssh wrapper script, when > ssh-ing, before launching orted: > > export HOME=${your_working_directory} \; > > (If $HOME means something for you jobs, then maybe this isn't a good > solution.) > > Got this from connecting some dots from the man page: > > Under Current Working Directory (emphasis added): > > "If the -wdir option is not specified, Open MPI will send the > directory name where mpirun was invoked to each of the remote nodes. > The remote nodes will try to change to that directory. If they are > unable (e.g., if the directory does not exist on that node), then > **Open MPI will use the default directory determined by the > starter**." > > In this case the starter is ssh; under Locating Files: > > "For example when using the rsh or ssh starters, **the initial > directory is $HOME by default**." > > Hope this helps someone! > > Jason Patton > > On Wed, Nov 23, 2016 at 1:43 PM, Jason Patton <jpat...@cs.wisc.edu> wrote: >> I would like to mpirun across nodes that do not share a filesystem and >> might have the executable in different directories. For example, node0 >> has the executable at /tmp/job42/mpitest and node1 has it at >> /tmp/job100/mpitest. >> >> If you can grant me that I have a ssh wrapper script (that gets set as >> the orte/plm_rsh_agent**) that cds to where the executable lies on >> each worker node before launching orted, is there a way to tell the >> worker node orted processes to run the executable from the current >> working directory rather than from the absolute path that (I presume) >> the head node process advertises? I've tried adding/changing >> orte_remote_tmpdir_base per each worker orted process, but then I get >> an error about having both global_tmpdir and remote_tmpdir set. Then >> if I set local_tmpdir to match the head node, I'm back at square one. >> >> I know this sounds fairly convoluted, but I'm updating helper scripts >> for HTCondor so that its parallel universe can work with newer MPI >> versions (dealing with similar headaches trying to get hydra to >> cooperate). The default behavior is for condor to place each "job" >> (i.e. sshd+orted process) in a sandbox, and we cannot know the name of >> the sandbox directories ahead of time or assume that they will have >> the same name across nodes. The easiest way to deal with this is if we >> can assume the executable lies on a shared fs, but the fewer >> assumptions from our POV the better. (Even better would be if someone >> /really/ wants to build in condor support like has been done for other >> launchers; that's beyond me right now.) >> >> **Also, what is the correct parameter to set to rsh_agent? ompi_info >> (and mpirun) says orte_rsh_agent is deprecated, but online docs seem >> to suggest that plm_rsh_agent is deprecated. I'm using version 1.8.1. >> >> Thanks for any insight you can provide >> >> Jason Patton > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ _______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users