Hi David, On Thu, Feb 23, 2023 at 8:51 AM David Laehnemann <david.laehnem...@hhu.de> wrote:
> Quick follow-up question: do you have any indication of the rate of job > status checks via sacct that slurmdbd will gracefully handle (per > second)? Or any suggestions how to roughly determine such a rate for a > given cluster system? > I looked at your PR for context, and this line of snakemake looks problematic (I know this isn't part of your PR, it is part of the original code) https://github.com/snakemake/snakemake/commit/a0f04bab08113196fe1616a621bd6bf20fc05688#diff-d1b47826c1fc35806df72508e2f5e7f1d0424f9b2f7b9124810b051f5fe97f1bL296 : sacct_cmd = f"sacct -P -n --format=JobIdRaw,State -j {jobid}" Since jobid is an int, this looks like snakmake will individually probe each Slurm job it has launched. If snakemake was using batch logic to gather status for all your running jobs with one call to sacct, then you could probably set the interval low. But it looks like it is going to probe each job individually by ID, so it will make as many RPC calls as their are jobs in the pipeline when it is time to check the status. I could be wrong, but this is how I evaluated the code without going farther upstream. Best, -Sean