That worked!

But still a mystery.

I tried printing the environment immediately before mpirun.  Inside the Python 
wrapper, I do os.system('env') immediately before the subprocess.pOpen( 
['mpirun', ..., shell=False ] ) command.  This returns SHELL=/bin/csh, and I 
can confirm that getpwuid, if it works, would also have returned /bin/csh, as 
that is my default shell.

It is also interesting that it does not matter if the job-submission script is 
#!/bin/bash or #!/bin/tcsh (properly re-written, of course) -- I get the same 
errors either way. 

So why did the launcher use a bash syntax on the remote host?  It does not seem 
to be behaving exactly as you described.

But telling it to check the remote shell did the trick.

Thanks


-----Original Message-----
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Monday, April 07, 2014 4:12 PM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: Problem with shell when launching jobs 
with OpenMPI 1.6.5 rsh

I doubt that the rsh launcher is getting confused by the cmd you show below. 
However, if that command is embedded in a script that changes the shell away 
from your default shell, then yes - it might get confused. When the rsh 
launcher spawns your remote orted, it attempts to set some envars to ensure 
things are correctly setup (e.g., that the path is right). Thus, it needs to 
know what the remove shell is going to be.

If given no other direction, it assumes that both the remote shell and your 
current shell are your default shell as reported by getpwuid (if available - 
otherwise, it falls back to the SHELL envar). If the remote shell can be 
something different, then you need to set the "plm_rsh_assume_same_shell=0" MCA 
param so it will check the remote shell.


On Apr 7, 2014, at 1:53 PM, Blosch, Edwin L <edwin.l.blo...@lmco.com> wrote:

> Thanks Noam, that makes sense.
> 
> Yes, I did mean to do ". hello" (with space in between).  That was an attempt 
> to replicate whatever OpenMPI is doing.  
> 
> In the first post I mentioned that my mpirun command actually gets executed 
> from within a Python script using the subprocess module.  I don't know the 
> details of the rsh launcher, but there are 3 remote hosts in the hosts file, 
> and 3 sets of the error messages below.  May be the rsh launcher is getting 
> confused, doing something that is only valid under bash even though my 
> default login environment is /bin/csh.  
> 
> mpirun --machinefile mpihosts.914 -np 48 -x LD_LIBRARY_PATH --mca 
> orte_rsh_agent /usr/bin/rsh  solver_openmpi  -i flow.inp >& output
> 
> % cat output
> 
> /bin/.: Permission denied.
> OPAL_PREFIX=/apps/local/test/openmpi: Command not found.
> export: Command not found.
> PATH=/apps/local/test/openmpi/bin:/bin:/usr/bin:/usr/ccs/bin:/usr/local/bin:/usr/openwin/bin:/usr/local/etc:/home/bloscel/bin:/usr/ucb:/usr/bsd:
>  Command not found.
> export: Command not found.
> LD_LIBRARY_PATH: Undefined variable.
> /bin/.: Permission denied.
> OPAL_PREFIX=/apps/local/test/openmpi: Command not found.
> export: Command not found.
> PATH=/apps/local/test/openmpi/bin:/bin:/usr/bin:/usr/ccs/bin:/usr/local/bin:/usr/openwin/bin:/usr/local/etc:/home/bloscel/bin:/usr/ucb:/usr/bsd:
>  Command not found.
> export: Command not found.
> LD_LIBRARY_PATH: Undefined variable.
> /bin/.: Permission denied.
> OPAL_PREFIX=/apps/local/test/openmpi: Command not found.
> export: Command not found.
> PATH=/apps/local/test/openmpi/bin:/bin:/usr/bin:/usr/ccs/bin:/usr/local/bin:/usr/openwin/bin:/usr/local/etc:/home/bloscel/bin:/usr/ucb:/usr/bsd:
>  Command not found.
> export: Command not found.
> LD_LIBRARY_PATH: Undefined variable.
> 
> -----Original Message-----
> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Noam Bernstein
> Sent: Monday, April 07, 2014 3:41 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] EXTERNAL: Re: Problem with shell when launching 
> jobs with OpenMPI 1.6.5 rsh
> 
> 
> On Apr 7, 2014, at 4:36 PM, Blosch, Edwin L <edwin.l.blo...@lmco.com> wrote:
> 
>> I guess this is not OpenMPI related anymore.  I can repeat the essential 
>> problem interactively:
>> 
>> % echo $SHELL
>> /bin/csh
>> 
>> % echo $SHLVL
>> 1
>> 
>> % cat hello
>> echo Hello
>> 
>> % /bin/bash hello
>> Hello
>> 
>> % /bin/csh hello
>> Hello
>> 
>> %  . hello
>> /bin/.: Permission denied
> 
> . is a bash internal which evaluates the contents of the file in the current 
> shell.  Since you're running csh, it's just looking for an executable named 
> ., which does not exist (the csh analog of bash's . is source). /bin/. _is_ 
> in your path, but it's a directory (namely /bin itself), which cannot be 
> executed, hence the error. Perhaps you meant to do
>   ./hello
> which means (both in bash and csh) run the script hello in the current 
> working directory (.), rather than looking for it in the list of directories 
> in $PATH
> 
>                                                                               
>         Noam
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to