Slurm is almost certainly calling execve() with the path to a copy of this 
script as an argument eventually, so yes, tcsh will be noticed by the Linux 
kernel as the first line and invoked to handle the contents. Slurm doesn’t have 
to honor it since the kernel will. Slurm, usually makes a pass through the 
script to replace any %j instances in the #SBATCH lines with the jobid, etc, in 
a copy of the script before it runs it, but that's neither here nor there for 
you. The recommendations to set -x and -v to tcsh are probably your best 
debugging options at this point. My vote is with the others who think that the 
environment inside the script is likely screwed up. Throwing in a printenv and 
saving that can't hurt.

Bill. 

-- 
Bill Barth, Ph.D., Director, HPC
bba...@tacc.utexas.edu        |   Phone: (512) 232-7069
Office: ROC 1.435            |   Fax:   (512) 475-9445
 
 

On 3/22/19, 11:41 AM, "slurm-users on behalf of Reuti" 
<slurm-users-boun...@lists.schedmd.com on behalf of re...@staff.uni-marburg.de> 
wrote:

    
    > Am 22.03.2019 um 16:20 schrieb Prentice Bisbal <pbis...@pppl.gov>:
    > 
    > On 3/21/19 6:56 PM, Reuti wrote:
    >> Am 21.03.2019 um 23:43 schrieb Prentice Bisbal:
    >> 
    >>> Slurm-users,
    >>> 
    >>> My users here have developed a GUI application which serves as a GUI 
interface to various physics codes they use. From this GUI, they can submit 
jobs to Slurm. On Tuesday, we upgraded Slurm from 18.08.5-2 to 18.08.6-2,and a 
user has reported a problem when submitting Slurm jobs through this GUI app 
that do not occur when the same sbatch script is submitted from sbatch on the 
command-line.
    >>> 
    >>> […]
    >>> When I replaced the mpirun command with an equivalent srun command, 
everything works as desired, so the user can get back to work and be productive.
    >>> 
    >>> While srun is a suitable workaround, and is arguably the correct way to 
run an MPI job, I'd like to understand what is going on here. Any idea what is 
going wrong, or additional steps I can take to get more debug information?
    >> Was an alias to `mpirun` introduced? It may cover the real application 
and even the `which mpirun` will return the correct value, but never be 
executed.
    >> 
    >> $ type mpirun
    >> $ alias mpirun
    >> 
    >> may tell in the jobscript.
    >> 
    > Unfortunately, the script is in tcsh,
    
    Oh, I didn't notice this – correct.
    
    
    > so the 'type' command doesn't work since,
    
    Is it really running in `tcsh`? The commands look like being generic and 
available in various shells. Does SLURM honor the the first line of a script 
and/or use a default? In Bash a function would cover the `mpirun` too.
    
    (I'm more used to GridEngine, where this can be configured in both ways how 
to start the scripts.)
    
    In "tcsh" I see a defined "jobcmd" of having some effect.
    
    -- Reuti
    
    
    >  it's a bash built-in function. I did use the 'alias' command to see all 
the defined aliases, and mpirun and mpiexec are not aliased. Any other ideas?
    > 
    > Prentice
    > 
    > 
    > 
    > 
    
    
    

Reply via email to