Dears,

 we are using SLURM 18.08.6, we have 12 nodes with 4 x GPUs and 21 CPU-only nodes. We have 3 partitions:
  gpu: only gpu nodes,
  cpu: only cpu nodes
  longjobs: all nodes.

Jobs in longjobs are with the lowest priority and can be preempted to suspend.   Our goal is to to allow using GPU nodes also for backfill CPU jobs. The problem is with CPU jobs which requires a lot memory. Those jobs can block GPU jobs in queue, because suspended jobs are not releasing memory and GPU jobs will not be started, even free GPUs are available.

My question is:  Is there any partition or node option allowing to limit TRES memory but only on specific nodes? So  jobs in partition longjobs  with high memory requirements will be started only on CPU nodes and   on GPU nodes will be started only GPU jobs ( without memory limit) and CPU jobs bellow memory limit.

Or in different way: Is there any way how to reserve some memory on GPU nodes only for jobs in gpu partition and which can't be used for jobs in longjobs partition?

Thanks in advance,    Daniel Vecerka, CTU Prague



Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to