Josef,
for us, we put a load balancer in front of the login nodes with session
affinity enabled. This makes them land on the same backend node each time.
Also, for interactive X sessions, users start a desktop session on the
node and then use vnc to connect there. This accommodates disconnect
Hello Slurm users,
I'm trying to write a check in our job_submit.lua script that enforces
relative resource requirements such as disallowing more than 4 CPUs or 48GB
of memory per GPU. The QOS itself has a MaxTRESPerJob of
cpu=32,gres/gpu=8,mem=384G (roughly one full node), but we're looking to
pr
On 26/2/24 12:27 am, Josef Dvoracek via slurm-users wrote:
What is the recommended way to run longer interactive job at your systems?
We provide NX for our users and also access via JupyterHub.
We also have high priority QOS's intended for interactive use for rapid
response, but they are cap