On 23/2/23 2:55 am, David Laehnemann wrote:
And consequently, would using `scontrol` thus be the better default option (as opposed to `sacct`) for repeated job status checks by a workflow management system?
Many others have commented on this, but use of scontrol in this way is really really bad because of the impact it has on slurmctld. This is because responding to the RPC (IIRC) requires taking read locks on internal data structures and on a large, busy system (like ours, we recently rolled over slurm job IDs back to 1 after ~6 years of operation and run at over 90% occupancy most of the time) this can really damage scheduling performance.
We've had numerous occasions where we've had to track down users abusing scontrol in this way and redirect them to use sacct instead.
We already use the cli filter abilities in Slurm to impose a form of rate limiting on RPCs from other commands, but unfortunately scontrol is not covered by that.
All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA