[slurm-users] Re: canonical way to run longer shell/bash interactive job (instead of srun inside of screen/tmux at front-end)?

2024-02-27 Thread Brian Andrus via slurm-users
Josef, for us, we put a load balancer in front of the login nodes with session affinity enabled. This makes them land on the same backend node each time. Also, for interactive X sessions, users start a desktop session on the node and then use vnc to connect there. This accommodates disconnect

[slurm-users] Enforcing relative resource restrictions in submission script

2024-02-27 Thread Matthew R. Baney via slurm-users
Hello Slurm users, I'm trying to write a check in our job_submit.lua script that enforces relative resource requirements such as disallowing more than 4 CPUs or 48GB of memory per GPU. The QOS itself has a MaxTRESPerJob of cpu=32,gres/gpu=8,mem=384G (roughly one full node), but we're looking to pr

[slurm-users] Re: canonical way to run longer shell/bash interactive job (instead of srun inside of screen/tmux at front-end)?

2024-02-27 Thread Chris Samuel via slurm-users
On 26/2/24 12:27 am, Josef Dvoracek via slurm-users wrote: What is the recommended way to run longer interactive job at your systems? We provide NX for our users and also access via JupyterHub. We also have high priority QOS's intended for interactive use for rapid response, but they are cap