Hi Maria, Have you tried adding the -u flag (specifies unbuffered) to your srun command?
https://slurm.schedmd.com/srun.html#OPT_unbuffered Your description sounds like buffering, so this might help. Thanks, -Sean On Tue, Feb 9, 2021 at 6:49 PM Maria Semple <ma...@rstudio.com> wrote: > Hello all, > > I've noticed an odd behaviour with job steps in some Slurm environments. > When a script is launched directly as a job, the output is written to file > immediately. When the script is launched as a step in a job, output is > written in ~30 second chunks. This doesn't happen in all Slurm > environments, but if it happens in one, it seems to always happen. For > example, on my local development cluster, which is a single node on Ubuntu > 18, I don't experience this. On a large Centos 7 based cluster, I do. > > Below is a simple reproducible example: > > loop.sh: > #!/bin/bash > for i in {1..100} > do > echo $i > sleep 1 > done > > withsteps.sh: > #!/bin/bash > srun ./loop.sh > > Then from the command line running sbatch loop.sh followed by tail -f > slurm-<job #>.out prints the job output in smaller chunks, which appears > to be related to file system buffering or the time it takes for the tail > process to notice that the file has updated. Running cat on the file > every second shows that the output is in the file immediately after it is > emitted by the script. > > If you run sbatch withsteps.sh instead, tail-ing or repeatedly cat-ing > the output file will show that the job output is written in a chunk of 30 - > 35 lines. > > I'm hoping this is something that is possible to work around, potentially > related to an OS setting, the way Slurm was compiled, or a Slurm setting. > > -- > Thanks, > Maria >