That’s close to what we’re doing, but without dedicated nodes. We have three 
back-end partitions (interactive, any-interactive, and gpu-interactive), but 
the users typically don’t have to consider that, due to our job_submit.lua 
plugin.

All three partitions have a default of 2 hours, 1 core, 2 GB RAM, but users 
could request more cores and RAM (but not as much as a batch job — we used 
https://hpcbios.readthedocs.io/en/latest/HPCBIOS_05-05.html as a starting 
point).

If a GPU is requested, the job goes into the gpu-interactive partition and is 
limited to 16 cores per node (we have 28 cores per GPU node, but GPU jobs can’t 
keep them all busy)

If less than 12 cores per node is requested, the job goes into the 
any-interactive partition and could be handled on any of our GPU or non-GPU 
nodes.

If more than 12 cores per node is requested, the job goes into the interactive 
partition and is handled by only a non-GPU node.

I haven’t needed to QOS the interactive partitions, but that’s not a bad idea.

> On Jun 11, 2020, at 8:19 AM, Paul Edmon <ped...@cfa.harvard.edu> wrote:
> 
> Generally the way we've solved this is to set aside a specific set of
> nodes in a partition for interactive sessions.  We deliberately scale
> the size of the resources so that users will always run immediately and
> we also set a QoS on the partition to make it so that no one user can
> dominate the partition.
> 
> -Paul Edmon-
> 
> On 6/11/2020 8:49 AM, Loris Bennett wrote:
>> Hi Manual,
>> 
>> "Holtgrewe, Manuel" <manuel.holtgr...@bihealth.de> writes:
>> 
>>> Hi,
>>> 
>>> is there a way to make interactive logins where users will use almost no 
>>> resources "always succeed"?
>>> 
>>> In most of these interactive sessions, users will have mostly idle shells 
>>> running and do some batch job submissions. Is there a way to allocate 
>>> "infinite virtual cpus" on each node that can only be allocated to
>>> interactive jobs?
>> I have never done this but setting "OverSubscribe" in the appropriate
>> place might be what you are looking for.
>> 
>>   https://slurm.schedmd.com/cons_res_share.html
>> 
>> Personally, however, I would be a bit wary of doing this.  What if
>> someone does start a multithreaded process on purpose or by accident?
>> 
>> Wouldn't just using cgroups on your login node achieve what you want?
>> 
>> Cheers,
>> 
>> Loris
>> 
> 

Reply via email to