Hi Sean, Thanks again for all the feedback!
I'll definitely try to implement batch queries, then. Both for the default `sacct` query and for the fallback `scontrol` query. Also see here: https://github.com/snakemake/snakemake/pull/2136#issuecomment-1443295051 Those queries then should not have to happen too often, although do you have any indication of a range for when you say "you still wouldn't want to query the status too frequently." Because I don't really, and would probably opt for some compromise of every 30 seconds or so. One thing I didn't understand from your eMail is the part about job names, as the command I gave doesn't use job names for its query: sacct -X -P -n --format=JobIdRaw,State -j <jobid_1>,<jobid_2>,... Instead, it just uses the JobId, and isn't that guaranteed to be unique at any point in time? Or were you meaning to say that JobId can be non- unique? That would indeed spell trouble on a different level, and make status checks much more complicated... cheers, david On Thu, 2023-02-23 at 11:59 -0500, Sean Maxwell wrote: > Hi David, > > On Thu, Feb 23, 2023 at 10:50 AM David Laehnemann < > david.laehnem...@hhu.de> > wrote: > > > But from your comment I understand that handling these queries in > > batches would be less work for slurmdbd, right? So instead of > > querying > > each jobid with a separate database query, it would do one database > > query for the whole list? Is that really easier for the system, or > > would it end up doing a call for each jobid, anyway? > > > > From the perspective of avoiding RPC flood, it is much better to use > a > batch query. That said, if you have an extremely large number of jobs > in > the queue, you still wouldn't want to query the status too > frequently. > > > > And just to be as clear as possible, a call to sacct would then > > look > > like this: > > sacct -X -P -n --format=JobIdRaw,State -j <jobid_1>,<jobid_2>,... > > > > That would be one way to do it, but I think there are other > approaches that > might be better. For example, there is no requirement for the job > name to > be unique. So if the snakemake pipeline has a configurable instance > name="foo", and snakemake was configured to specify its own name as > the job > when submitting jobs (e.g. sbatch -J foo ...) then the query for all > jobs > in the pipeline is simply: > > sacct --name=foo > > Because we can of course rewrite the respective code section, so any > > insight on how to do this job accounting more efficiently (and > > better > > tailored to how Slurm does things) is appreciated. > > > > I appreciate that you are interested in improving the integration to > make > it more performant. We are seeing an increase in meta-scheduler use > at our > site, so this is a worthwhile problem to tackle. > > Thanks, > > -Sean