Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-27 Thread Chris Samuel
On 27/2/23 03:34, David Laehnemann wrote: Hi Chris, hi Sean, Hiya! thanks also (and thanks again) for chiming in. No worries. Quick follow-up question: Would `squeue` be a better fall-back command than `scontrol` from the perspective of keeping `slurmctld` responsive? Sadly not, whils

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-27 Thread Chris Samuel
On 27/2/23 06:53, Brian Andrus wrote: Sorry, I had to share that this is very much like "Are we there yet?" on a road trip with kids 😄 Slurm is trying to drive. Oh I love this analogy! Whereas sacct is like looking talking to the navigator. The navigator does talk to the driver to give dir

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-27 Thread David Laehnemann
Hi Brian, thanks for your ideas. Follow-up questions, because further digging through the docs didn't get me anywhere definitive on this: IMHO, the true solution is that if a job's info NEEDS updated that > often, have the job itself report what it is doing (but NOT via > slurm > commands). The

Re: [slurm-users] priority access and QoS

2023-02-27 Thread Jason Simms
Hello all, I haven't found any guidance that seems to be the current "better practice," but this does seem to be a common use case. I imagine there are multiple ways to accomplish this goal. For example, you could assuredly do it with QoS, but you can likely also accomplish this with some other we

Re: [slurm-users] priority access and QoS

2023-02-27 Thread Styrk, Daryl
Marko, I’m in a similar situation. We have many Accounts with dedicated hardware and recently ran into a situation where a user with dedicated submitted hundreds of jobs and they overflowed into the community hardware which caused an unexpected backlog. I believe QoS will help us with that as w

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-27 Thread Bas van der Vlies
We have many jupyterhub jobs on our cluster that also does a lot of job queries. Could adjust the query time. But what I did is that 1 process queries all the jobs `squeue --json` and the jupyterhub query script looks in this output. Instead that every jupyterhub job queries the batch system. I

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-27 Thread Ãœmit Seren
As a side note: In Slurm 23.x a new rate limiting feature for client RPC calls was added: (see this commit: https://github.com/SchedMD/slurm/commit/674f118140e171d10c2501444a0040e1492f4eab#diff-b4e84d09d9b1d817a964fb78baba0a2ea6316bfc10c1405329a95ad0353ca33e ) This would give operators the ability

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-27 Thread Davide DelVento
> > And if you are seeing a workflow management system causing trouble on > > your system, probably the most sustainable way of getting this resolved > > is to file issues or pull requests with the respective project, with > > suggestions like the ones you made. For snakemake, a second good point >

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-27 Thread Loris Bennett
Hi David, David Laehnemann writes: > Dear Ward, > > if used correctly (and that is a big caveat for any method for > interacting with a cluster system), snakemake will only submit as many > jobs as can fit within the resources of the cluster at one point of > time (or however much resources you

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-27 Thread Brian Andrus
Sorry, I had to share that this is very much like "Are we there yet?" on a road trip with kids :) Slurm is trying to drive. Any communication to slurmctld will involve an RPC call (sinfo, squeue, scontrol, etc). You can see how many with sinfo. Too many RPC calls will cause failures. Asking slu

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-27 Thread David Laehnemann
Dear Ward, if used correctly (and that is a big caveat for any method for interacting with a cluster system), snakemake will only submit as many jobs as can fit within the resources of the cluster at one point of time (or however much resources you tell snakemake that it can use). So unless there

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-27 Thread Ward Poelmans
On 24/02/2023 18:34, David Laehnemann wrote: Those queries then should not have to happen too often, although do you have any indication of a range for when you say "you still wouldn't want to query the status too frequently." Because I don't really, and would probably opt for some compromise of

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-27 Thread David Laehnemann
Hi Chris, hi Sean, thanks also (and thanks again) for chiming in. Quick follow-up question: Would `squeue` be a better fall-back command than `scontrol` from the perspective of keeping `slurmctld` responsive? From what I can see in the general overview of how slurm works ( https://slurm.schedmd.