So there is a --chdir for sbatch too. This implies that the same path has to exist on all nodes. Something to keep in mind when creating a slurm cluster.
On Tue, Jan 21, 2020 at 12:58 PM William Brown <will...@signalbox.org.uk> wrote: > The srun man page says: > > > > When initiating remote processes *srun* will propagate the current > working directory, unless *--chdir*=<*path*> is specified, in which case > *path* will become the working directory for the remote processes. > > > > William > > > > *From:* slurm-users <slurm-users-boun...@lists.schedmd.com> *On Behalf Of > *Dean Schulze > *Sent:* 21 January 2020 19:27 > *To:* Slurm User Community List <slurm-users@lists.schedmd.com> > *Subject:* [slurm-users] sbatch sending the working directory from the > controller to the node > > > > I run this sbatch script from the controller: > > ======================= > #!/bin/bash > #SBATCH --job-name=test_job > #SBATCH --mail-type=NONE # Mail events (NONE, BEGIN, END, FAIL, ALL) > #SBATCH --ntasks=1 > #SBATCH --mem=1gb > #SBATCH --time=00:05:00 # Time limit hrs:min:sec > #SBATCH --output=test_job_%j.log # Standard output and error log > > pwd; hostname; date > ======================= > > > The node gets the directory that sbatch was executed from on the > controller and tries to write the output file to that directory, which > doesn't exist on the node. The node slurmd.log shows this error: > > 2020-01-21T11:25:36.389] [7.batch] error: Could not open stdout file > /home/dean/src/slurm.example.scripts/serial_test_7.log: No such file or > directory > > > If I change the sbatch script --output to a fully qualified directory that > exists on the node > > --output=/home/nodeuser/serial_test_%j.log > > the output file is written to that directory, but it includes this error > showing that the slurm node is trying to execute the job in the directory > that sbatch was run from on the controller: > > slurmstepd: error: couldn't chdir to > `/home/dean/src/slurm.example.scripts': No such file or directory: going to > /tmp instead > > > The sbatch docs say nothing about why the node gets the pwd from the > controller. Why would slurm send a directory to a node that may not exist > on the node and expect it to use it? > > > > What's the right way to specify the --output directory in an sbatch script? > > > > Thanks. > > >