Re: [slurm-users] ActiveFeatures job submission

2022-02-09 Thread Paul Brunk
Hi Alexander: This is a great case for using Node Health Check (https://github.com/mej/nhc). We use this so that each node periodically runs an admin-selected set of tests (e.g. "is /work readable?"), and automatically Drains a node which fails any of them, and puts the reason in the node's Re

Re: [slurm-users] Creating groups of nodes with exclusive access to a resources within a partition.

2022-02-09 Thread Paul Brunk
Hello Rich: You could create partitions "bulk_a", "bulk_b", "bulk_c" (names are arbitrary) which map onto those three groups of nodes and have the intended resource limits set at partition level. Then make job_submit lua cause all jobs submitted to "bulk" (or only the subset requesting a speci

Re: [slurm-users] What is the 'Root/Cluster association' level in Resource Limits document mean?

2022-02-09 Thread Paul Brunk
Hi: You can use e.g. 'sacctmgr show -s users', and you'll see each user's cluster assocation as one of the output columns. If the name were 'yourcluster', then you could do: sacctmgr modify cluster name=yourcluster set grpTres="node=8". == Paul Brunk, system administrator Georgia Advanced Resour

Re: [slurm-users] JobComp file not rotating

2022-02-09 Thread Paul Brunk
Hi Stuart: I've long seen similar on my server. If I don't intervene then I don't really get a per-day completed job log, but rather some empty ones and some that span days. I can live with this until I do something about it--the job completion entries are all there. I just can't infer from

Re: [slurm-users] sbatch - accept jobs above limits

2022-02-09 Thread Christopher Samuel
On 2/8/22 11:41 pm, Alexander Block wrote: I'm just discussing a familiar case with SchedMD right now (ticket 13309). But it seems that it is not possible with Slurm to submit jobs that request features/configuration that are not available at the moment of submission. Does --hold not allow t

Re: [slurm-users] sbatch - accept jobs above limits

2022-02-09 Thread Ryan Cox
Mike, You could potentially add a non-existent node (or nodes) to the configuration that has a million cores, petabytes of RAM, and all the features in the world.  Then it "exists" in Slurm.  I don't know if FUTURE would work, but if you can tolerate having a DOWN node in sinfo, that could wo

Re: [slurm-users] sbatch - accept jobs above limits

2022-02-09 Thread Brian Andrus
Just curious as to expectations out here. When /should /slurm immediately reject a job? Brian Andrus On 2/8/2022 11:41 PM, Alexander Block wrote: Hi Mike, I'm just discussing a familiar case with SchedMD right now (ticket 13309). But it seems that it is not possible with Slurm to submit jobs