That's pretty slick. We just have a test, gpu_test, and remotedesktop
partition set up for those purposes.
What the real trick is making sure you have sufficient spare capacity
that you can deliberately idle for these purposes. If we were a smaller
shop with less hardware I wouldn't be able to set aside as much hardware
for this. If that was the case I would likely go the route of a single
server with oversubscribe.
You could try to do it with an active partition with no deliberately
idle resources, but then you will want to make sure that your small jobs
are really small and won't impact larger work. I don't necessarily
recommend that. A single node with oversubscribe should be sufficient.
If you can't spare a single node then a VM would do the job.
-Paul Edmon-
On 6/11/2020 9:28 AM, Renfro, Michael wrote:
That’s close to what we’re doing, but without dedicated nodes. We have three
back-end partitions (interactive, any-interactive, and gpu-interactive), but
the users typically don’t have to consider that, due to our job_submit.lua
plugin.
All three partitions have a default of 2 hours, 1 core, 2 GB RAM, but users
could request more cores and RAM (but not as much as a batch job — we used
https://hpcbios.readthedocs.io/en/latest/HPCBIOS_05-05.html as a starting
point).
If a GPU is requested, the job goes into the gpu-interactive partition and is
limited to 16 cores per node (we have 28 cores per GPU node, but GPU jobs can’t
keep them all busy)
If less than 12 cores per node is requested, the job goes into the
any-interactive partition and could be handled on any of our GPU or non-GPU
nodes.
If more than 12 cores per node is requested, the job goes into the interactive
partition and is handled by only a non-GPU node.
I haven’t needed to QOS the interactive partitions, but that’s not a bad idea.
On Jun 11, 2020, at 8:19 AM, Paul Edmon <ped...@cfa.harvard.edu> wrote:
Generally the way we've solved this is to set aside a specific set of
nodes in a partition for interactive sessions. We deliberately scale
the size of the resources so that users will always run immediately and
we also set a QoS on the partition to make it so that no one user can
dominate the partition.
-Paul Edmon-
On 6/11/2020 8:49 AM, Loris Bennett wrote:
Hi Manual,
"Holtgrewe, Manuel" <manuel.holtgr...@bihealth.de> writes:
Hi,
is there a way to make interactive logins where users will use almost no resources
"always succeed"?
In most of these interactive sessions, users will have mostly idle shells running and do
some batch job submissions. Is there a way to allocate "infinite virtual cpus"
on each node that can only be allocated to
interactive jobs?
I have never done this but setting "OverSubscribe" in the appropriate
place might be what you are looking for.
https://slurm.schedmd.com/cons_res_share.html
Personally, however, I would be a bit wary of doing this. What if
someone does start a multithreaded process on purpose or by accident?
Wouldn't just using cgroups on your login node achieve what you want?
Cheers,
Loris