Re: [slurm-users] speed / efficiency of sacct vs. scontrol

Sean Maxwell Thu, 23 Feb 2023 06:49:48 -0800

Hi David,

On Thu, Feb 23, 2023 at 8:51 AM David Laehnemann <[email protected]>
wrote:


> Quick follow-up question: do you have any indication of the rate of job
> status checks via sacct that slurmdbd will gracefully handle (per
> second)? Or any suggestions how to roughly determine such a rate for a
> given cluster system?
>

I looked at your PR for context, and this line of snakemake looks
problematic (I know this isn't part of your PR, it is part of the original
code)
https://github.com/snakemake/snakemake/commit/a0f04bab08113196fe1616a621bd6bf20fc05688#diff-d1b47826c1fc35806df72508e2f5e7f1d0424f9b2f7b9124810b051f5fe97f1bL296
:

sacct_cmd = f"sacct -P -n --format=JobIdRaw,State -j {jobid}"

Since jobid is an int, this looks like snakmake will individually probe
each Slurm job it has launched. If snakemake was using batch logic to
gather status for all your running jobs with one call to sacct, then you
could probably set the interval low. But it looks like it is going to probe
each job individually by ID, so it will make as many RPC calls as their are
jobs in the pipeline when it is time to check the status.

I could be wrong, but this is how I evaluated the code without going
farther upstream.

Best,

-Sean

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

Reply via email to