On 27/2/23 03:34, David Laehnemann wrote:
Hi Chris, hi Sean,
Hiya!
thanks also (and thanks again) for chiming in.
No worries.
Quick follow-up question:
Would `squeue` be a better fall-back command than `scontrol` from the
perspective of keeping `slurmctld` responsive?
Sadly not, whils
On 27/2/23 06:53, Brian Andrus wrote:
Sorry, I had to share that this is very much like "Are we there yet?" on
a road trip with kids 😄
Slurm is trying to drive.
Oh I love this analogy!
Whereas sacct is like looking talking to the navigator. The navigator
does talk to the driver to give dir
Hi Brian,
thanks for your ideas. Follow-up questions, because further digging
through the docs didn't get me anywhere definitive on this:
IMHO, the true solution is that if a job's info NEEDS updated that
> often, have the job itself report what it is doing (but NOT via
> slurm
> commands). The
Hello all,
I haven't found any guidance that seems to be the current "better
practice," but this does seem to be a common use case. I imagine there are
multiple ways to accomplish this goal. For example, you could assuredly do
it with QoS, but you can likely also accomplish this with some other
we
Marko,
I’m in a similar situation. We have many Accounts with dedicated hardware and
recently ran into a situation where a user with dedicated submitted hundreds of
jobs and they overflowed into the community hardware which caused an unexpected
backlog. I believe QoS will help us with that as w
We have many jupyterhub jobs on our cluster that also does a lot of job
queries. Could adjust the query time. But what I did is that 1 process
queries all the jobs `squeue --json` and the jupyterhub query script
looks in this output.
Instead that every jupyterhub job queries the batch system. I
As a side note:
In Slurm 23.x a new rate limiting feature for client RPC calls was added:
(see this commit:
https://github.com/SchedMD/slurm/commit/674f118140e171d10c2501444a0040e1492f4eab#diff-b4e84d09d9b1d817a964fb78baba0a2ea6316bfc10c1405329a95ad0353ca33e
)
This would give operators the ability
> > And if you are seeing a workflow management system causing trouble on
> > your system, probably the most sustainable way of getting this resolved
> > is to file issues or pull requests with the respective project, with
> > suggestions like the ones you made. For snakemake, a second good point
>
Hi David,
David Laehnemann writes:
> Dear Ward,
>
> if used correctly (and that is a big caveat for any method for
> interacting with a cluster system), snakemake will only submit as many
> jobs as can fit within the resources of the cluster at one point of
> time (or however much resources you
Sorry, I had to share that this is very much like "Are we there yet?" on
a road trip with kids :)
Slurm is trying to drive. Any communication to slurmctld will involve an
RPC call (sinfo, squeue, scontrol, etc). You can see how many with sinfo.
Too many RPC calls will cause failures. Asking slu
Dear Ward,
if used correctly (and that is a big caveat for any method for
interacting with a cluster system), snakemake will only submit as many
jobs as can fit within the resources of the cluster at one point of
time (or however much resources you tell snakemake that it can use). So
unless there
On 24/02/2023 18:34, David Laehnemann wrote:
Those queries then should not have to happen too often, although do you
have any indication of a range for when you say "you still wouldn't
want to query the status too frequently." Because I don't really, and
would probably opt for some compromise of
Hi Chris, hi Sean,
thanks also (and thanks again) for chiming in.
Quick follow-up question:
Would `squeue` be a better fall-back command than `scontrol` from the
perspective of keeping `slurmctld` responsive? From what I can see in
the general overview of how slurm works (
https://slurm.schedmd.
13 matches
Mail list logo