The srun man page says:

 

When initiating remote processes srun will propagate the current working 
directory, unless --chdir=<path> is specified, in which case path will become 
the working directory for the remote processes.

 

William

 

From: slurm-users <slurm-users-boun...@lists.schedmd.com> On Behalf Of Dean 
Schulze
Sent: 21 January 2020 19:27
To: Slurm User Community List <slurm-users@lists.schedmd.com>
Subject: [slurm-users] sbatch sending the working directory from the controller 
to the node

 

I run this sbatch script from the controller:

=======================
#!/bin/bash
#SBATCH --job-name=test_job
#SBATCH --mail-type=NONE    # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --ntasks=1
#SBATCH --mem=1gb
#SBATCH --time=00:05:00     # Time limit hrs:min:sec
#SBATCH --output=test_job_%j.log   # Standard output and error log

pwd; hostname; date
=======================


The node gets the directory that sbatch was executed from on the controller and 
tries to write the output file to that directory, which doesn't exist on the 
node.  The node slurmd.log shows this error:

2020-01-21T11:25:36.389] [7.batch] error: Could not open stdout file 
/home/dean/src/slurm.example.scripts/serial_test_7.log: No such file or 
directory


If I change the sbatch script --output to a fully qualified directory that 
exists on the node

    --output=/home/nodeuser/serial_test_%j.log

the output file is written to that directory, but it includes this error 
showing that the slurm node is trying to execute the job in the directory that 
sbatch was run from on the controller:

slurmstepd: error: couldn't chdir to `/home/dean/src/slurm.example.scripts': No 
such file or directory: going to /tmp instead


The sbatch docs say nothing about why the node gets the pwd from the 
controller.  Why would slurm send a directory to a node that may not exist on 
the node and expect it to use it?

 

What's the right way to specify the --output directory in an sbatch script?

 

Thanks.




Reply via email to