[slurm-users] spreading jobs out across the cluster

2023-06-14 Thread Stephen Berg, Code 7309
I'm currently testing a new slurm setup before converting an existing pbs/torque grid over.  Right now I've got 8 nodes in one partition, 48 cores on each.  There's a second partition of older systems configured as 4 core nodes so the users can run some serial jobs. During some testing I've no

Re: [slurm-users] spreading jobs out across the cluster

2023-06-14 Thread Loris Bennett
Hi Stephen, "Stephen Berg, Code 7309" writes: > I'm currently testing a new slurm setup before converting an existing > pbs/torque grid over.  Right now I've got 8 nodes in one partition, 48 > cores on each.  There's a second partition of older systems configured > as 4 core nodes so the users

[slurm-users] Aborting a job from inside the prolog

2023-06-14 Thread Alexander Grund
Hi, We are doing some checking on the users Job inside the prolog script and upon failure of those checks the job should be canceled. Our first approach with `scancel $SLURM_JOB_ID; exit 1` doesn't seem to work as the (sbatch) job still gets re-queued. Is this possible at all (i.e. prevent

[slurm-users] Disable --no-allocate support for a node/SlurmD

2023-06-14 Thread Alexander Grund
Hi, we do some additional checking on a user and the batch script in a Prolog script. However the `--no-allocate`/`-Z` bypasses allocation and hence execution of the Prolog/Epilog. Is there a way to configure SlurmD to deny access to jobs without allocations or more generally all interactive

Re: [slurm-users] Disable --no-allocate support for a node/SlurmD

2023-06-14 Thread René Sitt
Hello Alex, I'd suggest taking a look at Slurm's Lua plugins for these kind of problems: https://slurm.schedmd.com/cli_filter_plugins.html https://slurm.schedmd.com/job_submit_plugins.html As far as I understand it, cli_filter.lua is geared towards controlling the use of specific commandline o

Re: [slurm-users] Disable --no-allocate support for a node/SlurmD

2023-06-14 Thread Alexander Grund
job_submit.lua allows you to view (and edit!) all job parameters that are known at submit time, including the option to refuse a configuration by returning `slurm.ERROR`instead of `slurm.SUCCESS`. The common way to filter for interactive jobs in job_submit.lua is checking whether job_desc.script i

Re: [slurm-users] Disable --no-allocate support for a node/SlurmD

2023-06-14 Thread René Sitt
Hi, Thanks for the suggestion. However as I understand it this requires additionally trusting the node where those scripts are running on, which is, I guess, the one running SlurmCtlD. The reason we are using Prolog scripts is that they are running on the very node the job will be running o

[slurm-users] trying to configure preemption partitions and also non-preemption with OverSubcribe=FORCE

2023-06-14 Thread Kevin Broch
The general idea is to have priority batch partitions with preemptions that can occur for higher priority jobs (suspending the lower priority). Also there's an interactive partition where users can run GUI tools that can't be preempted. This works fine up to the point that I would like to OverSubs