On 06/15/2022 05:25 PM, Ward Poelmans wrote: > Hi Guillaume, > > On 15/06/2022 16:59, Guillaume De Nayer wrote: >> >> Perhaps I missunderstand the Slurm documentation... >> >> As thought that the --exclusive option used in combination with sbatch >> will reserve the whole node (40 cores) for the job (submitted with >> sbatch). This part is working fine. I can check it with sacct. >> >> Then, this job starts subtasks on the reserved 40 cores with srun. >> Therefore I'm using "-n1 -c1" in combination with "srun". I thought that >> it was possible to use the reserved cores inside this job using srun. > > You're correct. --exclusive will give you all cores on the nodes but > only as much memory as requested. > > >> The following slightly modified job without --exclusive and with >> --ntasks=2 leads to a similar problem: Only one srun is running at a >> time. The second starts directly after the first one finished. >> >> #!/bin/bash >> #SBATCH --job-name=test_multi_prog_srun >> #SBATCH --ntasks=2 >> #SBATCH --partition=short >> #SBATCH --time=02:00:00 >> >> srun -vvv --exact -n1 -c1 sleep 20 > srun1.log 2>&1 & >> srun -vvv --exact -n1 -c1 sleep 30 > srun2.log 2>&1 & >> wait > > This should work... It works on our cluster. Are you sure they don't run > in parallel? >
Yes I'm pretty sure that it does not work in parallel: The command sacct show me only on subtask "RUNNING". Then, when this subtask is marked as "COMPLETED", the second one appears and is marked "RUNNING". Moreover, if I directly connect on the node, only one process of "sleep" is running. ok. If it works on your cluster, I have perhaps a problem in my slurm config. Which version of slurm are you using on your cluster? And can you share your slurm.conf? > We usually recommend to use gnu parallel or xargs like: > > xargs -P $SLURM_NTASKS srun -N 1 -n 1 -c 1 --exact sleep 30 > ok. I will install "gnu parallel" and also test your xargs command. Thx a lot! Guillaume