Hi Pablo,
I did something similar a while back and my problem was that probing the
slurm api too often was causing problems for slurm.
Didn't you encounter a similar problem?
Please let me know
Thanks
Oren
On Tue, Sep 24, 2024 at 4:50 PM Pablo Collado Soto via slurm-users <
slurm-users@lists.
The low priority jobs definitely can’t “fit in” before the high priority jobs
would start, but I don’t think that should matter. The idle nodes are incapable
of running the high priority jobs, ever. I would expect slurm to assign those
nodes the highest priority jobs that they are capable of run
You might need to do some tuning on your backfill loop as that loop
should be the one that backfills in those lower priority jobs. I would
also look to see if those lower priority jobs will actually fit in prior
to the higher priority job running, they may not.
-Paul Edmon-
On 9/24/24 2:19 P
Do you have backfill scheduling [1] enabled? If so, what settings are in place?
And the lower-priority jobs will only be eligible for backfill if and only if
they don’t delay the start of the higher priority jobs.
So what kind of resources and time does a given array job require? Odds are,
they
I experimented a bit and think I have figured out the problem but not the
solution.
We use multifactor priority with the job account the primary factor. Right now
one project has much higher priority due to a deadline. Those are the jobs that
are pending with “Resources”. They cannot run on the
In theory, if jobs are pending with “Priority”, one or more other jobs will be
pending with “Resources”.
So a few questions:
1. What are the “Resources” jobs waiting on, resource-wise?
2. When are they scheduled to start?
3. Can your array jobs backfill into the idle resources and fini
Just following up on my own message in case someone else is trying to figure
out RawUsage and Fair Share.
I ran some additional tests, except that I ran jobs for 10 min instead of 1
min. The procedure was
1. Set the accounting stats to update every minute in slurm.conf
PriorityCalcPeriod=1
2.
Hi,
On our cluster we have some jobs that are queued even though there are
available nodes to run on. The listed reason is "priority" but that doesn't
really make sense to me. Slurm isn't picking another job to run on those nodes;
it's just not running anything at all. We do have a quite hetero
Ok, that example helped. Max of 200G on a single node, per user (not job). No
limits on how many jobs and nodes they can use...just a limit of 200G per node
per user.
And in that case, it's out of my realm of experience. 🙂 I'm relatively
confident there IS a way...but I don't know it offha
> "So if they submit a 2 nd job, that job can start but will have to go onto
> another node, and will again be restricted to 200G? So they can start as many
> jobs as there are nodes, and each job will be restricted to using 1 node and
> 200G of memory?"
Yes that's it. We already have MaxNodes
Ah, sorry, I didn't catch that from your first post (though you did say it).
So, you are trying to limit the user to no more than 200G of memory on a single
node? So if they submit a 2nd job, that job can start but will have to go onto
another node, and will again be restricted to 200G? So the
Thank you for your answer.
To test it I tried:
sacctmgr update qos normal set maxtresperuser=cpu=2
# Then in slurm.conf
PartitionName=test […] qos=normal
But then if I submit several 1-cpu jobs only two start and the others stay
pending, even though I have several nodes available. So it see
You have the right idea.
On that same page, you'll find MaxTRESPerUser, as a QOS parameter.
You can create a QOS with the restrictions you'd like, and then in the
partition definition, you give it that QOS. The QOS will then apply its
restrictions to any jobs that use that partition.
Rob
Hi all,
I recently wrote an SLURM input plugin [0] for Telegraf [1].
I just wanted to let the community know so that you can use it if you'd
find that useful.
Maybe its existence can also be included in the documentation somewhere?
Anyway, thanks a ton fo
Hello,
We are looking for a method to limit the TRES used by each user on a per-node
basis. For example, we would like to limit the total memory allocation of jobs
from a user to 200G per node.
There is MaxTRESperNode
(https://slurm.schedmd.com/sacctmgr.html#OPT_MaxTRESPerNode), but
unfortuna
15 matches
Mail list logo