Re: [slurm-users] snakemake and slurm in general

2023-02-23 Thread Loris Bennett
Hi David, David Laehnemann writes: > Hi Loris, > > I gave this a new subject, as this has nothing to do with my original > question. > > Maybe this is what you were looking for in the snakemake documentation: > > https://snakemake.readthedocs.io/en/latest/executing/grouping.html#job-grouping > >

Re: [slurm-users] snakemake and slurm in general

2023-02-23 Thread Loris Bennett
Hi David, (Thanks for changing the subject to something more appropriate). David Laehnemann writes: > Yes, but only to an extent. The linked conversation ends with this: > >>> Do you have any best practice about setting MaxJobCount to a proper > number? > >> That depends upon your workload. You

Re: [slurm-users] Single Node cluster. How to manage oversubscribing

2023-02-23 Thread Analabha Roy
Howdy, and thanks for the warm welcome, On Fri, 24 Feb 2023 at 07:31, Doug Meyer wrote: > Hi, > > Did you configure your node definition with the outputs of slurmd -C? > Ignore boards. Don't know if it is still true but several years ago > declaring boards made things difficult. > > $ slurmd -C

Re: [slurm-users] Single Node cluster. How to manage oversubscribing

2023-02-23 Thread Doug Meyer
Hi, Did you configure your node definition with the outputs of slurmd -C? Ignore boards. Don't know if it is still true but several years ago declaring boards made things difficult. Also, if you have hyperthreaded AMD or Intel processors your partition declaration should be overscribe:2 Start w

Re: [slurm-users] snakemake and slurm in general

2023-02-23 Thread David Laehnemann
Yes, but only to an extent. The linked conversation ends with this: >> Do you have any best practice about setting MaxJobCount to a proper number? > That depends upon your workload. You could probably set MaxJobCount to at least 5 with most systems (assuming you have at least a few gigabytes

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-23 Thread Sean Maxwell
Hi David, On Thu, Feb 23, 2023 at 10:50 AM David Laehnemann wrote: > But from your comment I understand that handling these queries in > batches would be less work for slurmdbd, right? So instead of querying > each jobid with a separate database query, it would do one database > query for the wh

Re: [slurm-users] snakemake and slurm in general

2023-02-23 Thread Ole Holm Nielsen
On 2/23/23 17:07, David Laehnemann wrote: In addition, there are very clear limits to how many jobs slurm can handle in its queue, see for example this discussion: https://bugs.schedmd.com/show_bug.cgi?id=2366 My 2 cents: Slurm's job limits are configurable, see this Wiki page: https://wiki.fys

[slurm-users] snakemake and slurm in general

2023-02-23 Thread David Laehnemann
Hi Loris, I gave this a new subject, as this has nothing to do with my original question. Maybe this is what you were looking for in the snakemake documentation: https://snakemake.readthedocs.io/en/latest/executing/grouping.html#job-grouping You can basically bundle groups of (snakemake) jobs t

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-23 Thread David Laehnemann
Hi Sean, yes, this is exactly what snakemake currently does. I didn't write that code, but from my previous debugging, I think handling one job at a time was simply the logic of the general executor for cluster systems, and makes things like querying via scontrol as a fallback easier to handle. Bu

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-23 Thread Sean Maxwell
Hi David, On Thu, Feb 23, 2023 at 8:51 AM David Laehnemann wrote: > Quick follow-up question: do you have any indication of the rate of job > status checks via sacct that slurmdbd will gracefully handle (per > second)? Or any suggestions how to roughly determine such a rate for a > given cluster

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-23 Thread Loris Bennett
Hi David, David Laehnemann writes: [snip (16 lines)] > P.S.: @Loris and @Noam: Exactly, snakemake is a software distinct from > slurm that you can use to orchestrate large analysis workflows---on > anything from a desktop or laptop computer to all kinds of cluster / > cloud systems. In the case

[slurm-users] Single Node cluster. How to manage oversubscribing

2023-02-23 Thread Analabha Roy
Hi folks, I have a single-node "cluster" running Ubuntu 20.04 LTS with the distribution packages for slurm (slurm-wlm 19.05.5) Slurm only ran one job in the node at a time with the default configuration, leaving all other jobs pending. This happened even if that one job only requested like a few c

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-23 Thread David Laehnemann
Hi Sean, hi everybody, thanks a lot for the quick insights! My takeaway is: sacct is the better default for putting in lots of job status checks after all, as it will not impact the slurmctld scheduler. Quick follow-up question: do you have any indication of the rate of job status checks via sac

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-23 Thread Sean Maxwell
Hi David, scontrol - interacts with slurmctld using RPC, so it is faster, but requests put load on the scheduler itself. sacct - interacts with slurmdbd, so it doesn't place additional load on the scheduler. There is a balance to reach, but the scontrol approach is riskier and can start to interf

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-23 Thread Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)
On Feb 23, 2023, at 7:40 AM, Loris Bennett mailto:loris.benn...@fu-berlin.de>> wrote: Hi David, David Laehnemann mailto:david.laehnem...@hhu.de>> writes: by a workflow management system? I am probably being a bit naive, but I would have thought that the batch system should just be able start

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-23 Thread Loris Bennett
Hi David, David Laehnemann writes: > Dear Slurm users and developers, > > TL;DR: > Do any of you know if `scontrol` status checks of jobs are always > expected to be quicker than `sacct` job status checks? Do you have any > comparative timings between the two commands? > And consequently, would

[slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-23 Thread David Laehnemann
Dear Slurm users and developers, TL;DR: Do any of you know if `scontrol` status checks of jobs are always expected to be quicker than `sacct` job status checks? Do you have any comparative timings between the two commands? And consequently, would using `scontrol` thus be the better default option