Hello all, I've noticed an odd behaviour with job steps in some Slurm environments. When a script is launched directly as a job, the output is written to file immediately. When the script is launched as a step in a job, output is written in ~30 second chunks. This doesn't happen in all Slurm environments, but if it happens in one, it seems to always happen. For example, on my local development cluster, which is a single node on Ubuntu 18, I don't experience this. On a large Centos 7 based cluster, I do.
Below is a simple reproducible example: loop.sh: #!/bin/bash for i in {1..100} do echo $i sleep 1 done withsteps.sh: #!/bin/bash srun ./loop.sh Then from the command line running sbatch loop.sh followed by tail -f slurm-<job #>.out prints the job output in smaller chunks, which appears to be related to file system buffering or the time it takes for the tail process to notice that the file has updated. Running cat on the file every second shows that the output is in the file immediately after it is emitted by the script. If you run sbatch withsteps.sh instead, tail-ing or repeatedly cat-ing the output file will show that the job output is written in a chunk of 30 - 35 lines. I'm hoping this is something that is possible to work around, potentially related to an OS setting, the way Slurm was compiled, or a Slurm setting. -- Thanks, Maria