Is there a reason to run them as a single job?
It may be easier to just have 2 separate jobs of 16 cores each.
If there are dependency requirements, that is addressed by adding any
dependencies to the job submission.
Brian Andrus
On 7/25/2020 2:50 AM, Даниил Вахрамеев wrote:
Hi everyone!
I have SLURM cluster with several nodes with 16 vcpus per node. I've
tried to run the following code:
|#SBATCH --nodes 2 #SBATCH --ntasks 2 #SBATCH -c 16 srun --exclusive
--nodes=1 program1 & srun --exclusive --nodes=1 program2 & wait |
|program1| and |program2| needs 16cpus each and I expected that 2
nodes with 32 cores would be allocated and |program1| would be ran on
the first node and |program2| on the second one, but I got the
following error message:
|srun: error: Unable to create step for job 364966: Requested node
configuration is not available |
If I use only |--nodes| and |--ntasks| keys, sbatch allocates 2 nodes
with 2 cpus and if I use |--nodes| and |-c| options, I get message
that |--ntasks| should be defined.
If I set |--ntasks=1|, SLURM set nnodes to 1.
How can I run this two programs in one batch, each on one node and 16
vcpus?
------
Kind regards,
Daniil Vakhrameev