That's pretty slick.  We just have a test, gpu_test, and remotedesktop partition set up for those purposes.

What the real trick is making sure you have sufficient spare capacity that you can deliberately idle for these purposes.  If we were a smaller shop with less hardware I wouldn't be able to set aside as much hardware for this.  If that was the case I would likely go the route of a single server with oversubscribe.

You could try to do it with an active partition with no deliberately idle resources, but then you will want to make sure that your small jobs are really small and won't impact larger work.  I don't necessarily recommend that.  A single node with oversubscribe should be sufficient.  If you can't spare a single node then a VM would do the job.

-Paul Edmon-

On 6/11/2020 9:28 AM, Renfro, Michael wrote:
That’s close to what we’re doing, but without dedicated nodes. We have three 
back-end partitions (interactive, any-interactive, and gpu-interactive), but 
the users typically don’t have to consider that, due to our job_submit.lua 
plugin.

All three partitions have a default of 2 hours, 1 core, 2 GB RAM, but users 
could request more cores and RAM (but not as much as a batch job — we used 
https://hpcbios.readthedocs.io/en/latest/HPCBIOS_05-05.html as a starting 
point).

If a GPU is requested, the job goes into the gpu-interactive partition and is 
limited to 16 cores per node (we have 28 cores per GPU node, but GPU jobs can’t 
keep them all busy)

If less than 12 cores per node is requested, the job goes into the 
any-interactive partition and could be handled on any of our GPU or non-GPU 
nodes.

If more than 12 cores per node is requested, the job goes into the interactive 
partition and is handled by only a non-GPU node.

I haven’t needed to QOS the interactive partitions, but that’s not a bad idea.

On Jun 11, 2020, at 8:19 AM, Paul Edmon <ped...@cfa.harvard.edu> wrote:

Generally the way we've solved this is to set aside a specific set of
nodes in a partition for interactive sessions.  We deliberately scale
the size of the resources so that users will always run immediately and
we also set a QoS on the partition to make it so that no one user can
dominate the partition.

-Paul Edmon-

On 6/11/2020 8:49 AM, Loris Bennett wrote:
Hi Manual,

"Holtgrewe, Manuel" <manuel.holtgr...@bihealth.de> writes:

Hi,

is there a way to make interactive logins where users will use almost no resources 
"always succeed"?

In most of these interactive sessions, users will have mostly idle shells running and do 
some batch job submissions. Is there a way to allocate "infinite virtual cpus" 
on each node that can only be allocated to
interactive jobs?
I have never done this but setting "OverSubscribe" in the appropriate
place might be what you are looking for.

   https://slurm.schedmd.com/cons_res_share.html

Personally, however, I would be a bit wary of doing this.  What if
someone does start a multithreaded process on purpose or by accident?

Wouldn't just using cgroups on your login node achieve what you want?

Cheers,

Loris


Reply via email to