I guess this is not OpenMPI related anymore. I can repeat the essential problem interactively:
% echo $SHELL /bin/csh % echo $SHLVL 1 % cat hello echo Hello % /bin/bash hello Hello % /bin/csh hello Hello % . hello /bin/.: Permission denied I think I need to hope the administrator can fix it. Sorry for the bother... -----Original Message----- From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Reuti Sent: Monday, April 07, 2014 3:27 PM To: Open MPI Users Subject: EXTERNAL: Re: [OMPI users] Problem with shell when launching jobs with OpenMPI 1.6.5 rsh Am 07.04.2014 um 22:04 schrieb Blosch, Edwin L: > I am submitting a job for execution under SGE. My default shell is /bin/csh. Where - in SGE or on the interactive command line you get? > The script that is submitted has #!/bin/bash at the top. The script runs on > the 1st node allocated to the job. The script runs a Python wrapper that > ultimately issues the following mpirun command: > > /apps/local/test/openmpi/bin/mpirun --machinefile mpihosts.914 -np 48 -x > LD_LIBRARY_PATH -x MPI_ENVIRONMENT=1 --mca btl ^tcp --mca > shmem_mmap_relocate_backing_file -1 --bind-to-core --bycore --mca > orte_rsh_agent /usr/bin/rsh --mca plm_rsh_disable_qrsh 1 > /apps/local/test/solver/bin/solver_openmpi -cycles 50 -ri restart.0 -i > flow.inp >& output > > Just so there's no confusion, OpenMPI is built without support for SGE. It > should be using rsh to launch. > > There are 4 nodes involved (each 12 cores, 48 processes total). In the > output file, I see 3 sets of messages as shown below. I assume I am seeing 1 > set of messages for each of the 3 remote nodes where processes need to be > launched: > > /bin/.: Permission denied. > OPAL_PREFIX=/apps/local/falcon2014/openmpi: Command not found. > export: Command not found. > PATH=/apps/local/test/openmpi/bin:/bin:/usr/bin:/usr/ccs/bin:/usr/local/bin:/usr/openwin/bin:/usr/local/etc:/home/bloscel/bin:/usr/ucb:/usr/bsd: > Command not found. > export: Command not found. > LD_LIBRARY_PATH: Undefined variable. This looks really like csh is trying to interpret bash commands. In case SGE's queue is set up to have "shell_start_mode posix_compliant" set, the first line of the script is not treated in a special way. You can change the shell only by "-S /bin/bash" then (or redefine the queue to have "shell_start_mode unix_behavior" set and get the expected behavior when starting a script [side effect: the shell is not started as login shell any longer. See also `man sge_conf` => "login_shells" for details]). BTW: you don't want a tight integration by intention? -- Reuti > These look like errors you get when csh is trying to parse commands intended > for bash. > > Does anyone know what may be going on here? > > Thanks, > > Ed > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users