[slurm-users] Using swap for gang mode suspended jobs only

Daniel Letai Sun, 13 Oct 2019 08:25:33 -0700

Hi,

I'd like to allow job suspension in my cluster, without the "penalty" of RAM utilization. The jobs are sometimes very big and can require ~100GB mem on each node. Suspending such a job would usually mean almost nothing else can run on the same node, except for very small memory jobs.

Currently the solution is requeue preemption with or without checkpointing.

I don't want to use swap for running jobs, ever - I'd rather get OOM killed than use swap while the job is running.

Is there a way to tell Slurm to allocate swap and use it only for suspending, to allow preemption without terminating the jobs?

The nodes have ~TB of disk space each, and most jobs never utilize any of that (relying on shared storage instead), so local disk space is usually not a concern.

Using swap to store suspended jobs, while slow to freeze and thaw, seems o me to be a better localized solution than checkpointing and requeuing, allowing the job to resume "immediately" (sans disk io times) after the high priority job finishes, but if I'm mistaken, please enlighten me.

I was wandering if simply setting a large swap in linux, while setting AllowedSwapSpace=0 in cgroup.conf would work, but I suspect the following:

1. Even suspended, the job still remains in it's cgroup limits, and

2. Which process gets swapped is non-deterministic from my point of view - I'm not sure the kernel will swap out the suspended job rather than the new job, at least in it's early stages.

Thanks in advance,
--Dani_L.

[slurm-users] Using swap for gang mode suspended jobs only

Reply via email to