Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-27 Thread Chris Samuel
On 27/2/23 03:34, David Laehnemann wrote: Hi Chris, hi Sean, Hiya! thanks also (and thanks again) for chiming in. No worries. Quick follow-up question: Would `squeue` be a better fall-back command than `scontrol` from the perspective of keeping `slurmctld` responsive? Sadly not, whils

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-27 Thread Chris Samuel
On 27/2/23 06:53, Brian Andrus wrote: Sorry, I had to share that this is very much like "Are we there yet?" on a road trip with kids 😄 Slurm is trying to drive. Oh I love this analogy! Whereas sacct is like looking talking to the navigator. The navigator does talk to the driver to give dir

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-27 Thread David Laehnemann
Hi Brian, thanks for your ideas. Follow-up questions, because further digging through the docs didn't get me anywhere definitive on this: IMHO, the true solution is that if a job's info NEEDS updated that > often, have the job itself report what it is doing (but NOT via > slurm > commands). The

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-27 Thread Bas van der Vlies
We have many jupyterhub jobs on our cluster that also does a lot of job queries. Could adjust the query time. But what I did is that 1 process queries all the jobs `squeue --json` and the jupyterhub query script looks in this output. Instead that every jupyterhub job queries the batch system. I

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-27 Thread Ãœmit Seren
As a side note: In Slurm 23.x a new rate limiting feature for client RPC calls was added: (see this commit: https://github.com/SchedMD/slurm/commit/674f118140e171d10c2501444a0040e1492f4eab#diff-b4e84d09d9b1d817a964fb78baba0a2ea6316bfc10c1405329a95ad0353ca33e ) This would give operators the ability

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-27 Thread Davide DelVento
> > And if you are seeing a workflow management system causing trouble on > > your system, probably the most sustainable way of getting this resolved > > is to file issues or pull requests with the respective project, with > > suggestions like the ones you made. For snakemake, a second good point >

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-27 Thread Loris Bennett
Hi David, David Laehnemann writes: > Dear Ward, > > if used correctly (and that is a big caveat for any method for > interacting with a cluster system), snakemake will only submit as many > jobs as can fit within the resources of the cluster at one point of > time (or however much resources you

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-27 Thread Brian Andrus
Sorry, I had to share that this is very much like "Are we there yet?" on a road trip with kids :) Slurm is trying to drive. Any communication to slurmctld will involve an RPC call (sinfo, squeue, scontrol, etc). You can see how many with sinfo. Too many RPC calls will cause failures. Asking slu

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-27 Thread David Laehnemann
Dear Ward, if used correctly (and that is a big caveat for any method for interacting with a cluster system), snakemake will only submit as many jobs as can fit within the resources of the cluster at one point of time (or however much resources you tell snakemake that it can use). So unless there

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-27 Thread Ward Poelmans
On 24/02/2023 18:34, David Laehnemann wrote: Those queries then should not have to happen too often, although do you have any indication of a range for when you say "you still wouldn't want to query the status too frequently." Because I don't really, and would probably opt for some compromise of

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-27 Thread David Laehnemann
Hi Chris, hi Sean, thanks also (and thanks again) for chiming in. Quick follow-up question: Would `squeue` be a better fall-back command than `scontrol` from the perspective of keeping `slurmctld` responsive? From what I can see in the general overview of how slurm works ( https://slurm.schedmd.

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-25 Thread Chris Samuel
On 23/2/23 2:55 am, David Laehnemann wrote: And consequently, would using `scontrol` thus be the better default option (as opposed to `sacct`) for repeated job status checks by a workflow management system? Many others have commented on this, but use of scontrol in this way is really really b

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-24 Thread Sean Maxwell
Hi David, Those queries then should not have to happen too often, although do you > have any indication of a range for when you say "you still wouldn't > want to query the status too frequently." Because I don't really, and > would probably opt for some compromise of every 30 seconds or so. > Eve

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-24 Thread David Laehnemann
Hi Sean, Thanks again for all the feedback! I'll definitely try to implement batch queries, then. Both for the default `sacct` query and for the fallback `scontrol` query. Also see here: https://github.com/snakemake/snakemake/pull/2136#issuecomment-1443295051 Those queries then should not have t

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-23 Thread Sean Maxwell
Hi David, On Thu, Feb 23, 2023 at 10:50 AM David Laehnemann wrote: > But from your comment I understand that handling these queries in > batches would be less work for slurmdbd, right? So instead of querying > each jobid with a separate database query, it would do one database > query for the wh

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-23 Thread David Laehnemann
Hi Sean, yes, this is exactly what snakemake currently does. I didn't write that code, but from my previous debugging, I think handling one job at a time was simply the logic of the general executor for cluster systems, and makes things like querying via scontrol as a fallback easier to handle. Bu

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-23 Thread Sean Maxwell
Hi David, On Thu, Feb 23, 2023 at 8:51 AM David Laehnemann wrote: > Quick follow-up question: do you have any indication of the rate of job > status checks via sacct that slurmdbd will gracefully handle (per > second)? Or any suggestions how to roughly determine such a rate for a > given cluster

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-23 Thread Loris Bennett
Hi David, David Laehnemann writes: [snip (16 lines)] > P.S.: @Loris and @Noam: Exactly, snakemake is a software distinct from > slurm that you can use to orchestrate large analysis workflows---on > anything from a desktop or laptop computer to all kinds of cluster / > cloud systems. In the case

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-23 Thread David Laehnemann
Hi Sean, hi everybody, thanks a lot for the quick insights! My takeaway is: sacct is the better default for putting in lots of job status checks after all, as it will not impact the slurmctld scheduler. Quick follow-up question: do you have any indication of the rate of job status checks via sac

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-23 Thread Sean Maxwell
Hi David, scontrol - interacts with slurmctld using RPC, so it is faster, but requests put load on the scheduler itself. sacct - interacts with slurmdbd, so it doesn't place additional load on the scheduler. There is a balance to reach, but the scontrol approach is riskier and can start to interf

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-23 Thread Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)
On Feb 23, 2023, at 7:40 AM, Loris Bennett mailto:loris.benn...@fu-berlin.de>> wrote: Hi David, David Laehnemann mailto:david.laehnem...@hhu.de>> writes: by a workflow management system? I am probably being a bit naive, but I would have thought that the batch system should just be able start

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-23 Thread Loris Bennett
Hi David, David Laehnemann writes: > Dear Slurm users and developers, > > TL;DR: > Do any of you know if `scontrol` status checks of jobs are always > expected to be quicker than `sacct` job status checks? Do you have any > comparative timings between the two commands? > And consequently, would

[slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-23 Thread David Laehnemann
Dear Slurm users and developers, TL;DR: Do any of you know if `scontrol` status checks of jobs are always expected to be quicker than `sacct` job status checks? Do you have any comparative timings between the two commands? And consequently, would using `scontrol` thus be the better default option