Re: [slurm-users] Job Step Output Delay

Sean Maxwell Wed, 10 Feb 2021 04:31:32 -0800

Hi Maria,

Have you tried adding the -u flag (specifies unbuffered) to your srun
command?


https://slurm.schedmd.com/srun.html#OPT_unbuffered

Your description sounds like buffering, so this might help.

Thanks,

-Sean

On Tue, Feb 9, 2021 at 6:49 PM Maria Semple <ma...@rstudio.com> wrote:

> Hello all,
>
> I've noticed an odd behaviour with job steps in some Slurm environments.
> When a script is launched directly as a job, the output is written to file
> immediately. When the script is launched as a step in a job, output is
> written in ~30 second chunks. This doesn't happen in all Slurm
> environments, but if it happens in one, it seems to always happen. For
> example, on my local development cluster, which is a single node on Ubuntu
> 18, I don't experience this. On a large Centos 7 based cluster, I do.
>
> Below is a simple reproducible example:
>
> loop.sh:
> #!/bin/bash
> for i in {1..100}
> do
>    echo $i
>    sleep 1
> done
>
> withsteps.sh:
> #!/bin/bash
> srun ./loop.sh
>
> Then from the command line running sbatch loop.sh followed by tail -f
> slurm-<job #>.out prints the job output in smaller chunks, which appears
> to be related to file system buffering or the time it takes for the tail
> process to notice that the file has updated. Running cat on the file
> every second shows that the output is in the file immediately after it is
> emitted by the script.
>
> If you run sbatch withsteps.sh instead, tail-ing or repeatedly cat-ing
> the output file will show that the job output is written in a chunk of 30 -
> 35 lines.
>
> I'm hoping this is something that is possible to work around, potentially
> related to an OS setting, the way Slurm was compiled, or a Slurm setting.
>
> --
> Thanks,
> Maria
>

Re: [slurm-users] Job Step Output Delay

Reply via email to