[slurm-users] Re: SLURM Telegraf Plugin

2024-09-24 Thread Oren Shani via slurm-users
Hi Pablo, I did something similar a while back and my problem was that probing the slurm api too often was causing problems for slurm. Didn't you encounter a similar problem? Please let me know Thanks Oren On Tue, Sep 24, 2024 at 4:50 PM Pablo Collado Soto via slurm-users < slurm-users@lists.

[slurm-users] Re: Jobs pending with reason "priority" but nodes are idle

2024-09-24 Thread Long, Daniel S. via slurm-users
The low priority jobs definitely can’t “fit in” before the high priority jobs would start, but I don’t think that should matter. The idle nodes are incapable of running the high priority jobs, ever. I would expect slurm to assign those nodes the highest priority jobs that they are capable of run

[slurm-users] Re: Jobs pending with reason "priority" but nodes are idle

2024-09-24 Thread Paul Edmon via slurm-users
You might need to do some tuning on your backfill loop as that loop should be the one that backfills in those lower priority jobs.  I would also look to see if those lower priority jobs will actually fit in prior to the higher priority job running, they may not. -Paul Edmon- On 9/24/24 2:19 P

[slurm-users] Re: Jobs pending with reason "priority" but nodes are idle

2024-09-24 Thread Renfro, Michael via slurm-users
Do you have backfill scheduling [1] enabled? If so, what settings are in place? And the lower-priority jobs will only be eligible for backfill if and only if they don’t delay the start of the higher priority jobs. So what kind of resources and time does a given array job require? Odds are, they

[slurm-users] Re: Jobs pending with reason "priority" but nodes are idle

2024-09-24 Thread Long, Daniel S. via slurm-users
I experimented a bit and think I have figured out the problem but not the solution. We use multifactor priority with the job account the primary factor. Right now one project has much higher priority due to a deadline. Those are the jobs that are pending with “Resources”. They cannot run on the

[slurm-users] Re: Jobs pending with reason "priority" but nodes are idle

2024-09-24 Thread Renfro, Michael via slurm-users
In theory, if jobs are pending with “Priority”, one or more other jobs will be pending with “Resources”. So a few questions: 1. What are the “Resources” jobs waiting on, resource-wise? 2. When are they scheduled to start? 3. Can your array jobs backfill into the idle resources and fini

[slurm-users] Re: Setting up fairshare accounting

2024-09-24 Thread tluchko via slurm-users
Just following up on my own message in case someone else is trying to figure out RawUsage and Fair Share. I ran some additional tests, except that I ran jobs for 10 min instead of 1 min. The procedure was 1. Set the accounting stats to update every minute in slurm.conf PriorityCalcPeriod=1 2.

[slurm-users] Jobs pending with reason "priority" but nodes are idle

2024-09-24 Thread Long, Daniel S. via slurm-users
Hi, On our cluster we have some jobs that are queued even though there are available nodes to run on. The listed reason is "priority" but that doesn't really make sense to me. Slurm isn't picking another job to run on those nodes; it's just not running anything at all. We do have a quite hetero

[slurm-users] Re: Max TRES per user and node

2024-09-24 Thread Groner, Rob via slurm-users
Ok, that example helped. Max of 200G on a single node, per user (not job). No limits on how many jobs and nodes they can use...just a limit of 200G per node per user. And in that case, it's out of my realm of experience. 🙂 I'm relatively confident there IS a way...but I don't know it offha

[slurm-users] Re: Max TRES per user and node

2024-09-24 Thread Guillaume COCHARD via slurm-users
> "So if they submit a 2 nd job, that job can start but will have to go onto > another node, and will again be restricted to 200G? So they can start as many > jobs as there are nodes, and each job will be restricted to using 1 node and > 200G of memory?" Yes that's it. We already have MaxNodes

[slurm-users] Re: Max TRES per user and node

2024-09-24 Thread Groner, Rob via slurm-users
Ah, sorry, I didn't catch that from your first post (though you did say it). So, you are trying to limit the user to no more than 200G of memory on a single node? So if they submit a 2nd job, that job can start but will have to go onto another node, and will again be restricted to 200G? So the

[slurm-users] Re: Max TRES per user and node

2024-09-24 Thread Guillaume COCHARD via slurm-users
Thank you for your answer. To test it I tried: sacctmgr update qos normal set maxtresperuser=cpu=2 # Then in slurm.conf PartitionName=test […] qos=normal But then if I submit several 1-cpu jobs only two start and the others stay pending, even though I have several nodes available. So it see

[slurm-users] Re: Max TRES per user and node

2024-09-24 Thread Groner, Rob via slurm-users
You have the right idea. On that same page, you'll find MaxTRESPerUser, as a QOS parameter. You can create a QOS with the restrictions you'd like, and then in the partition definition, you give it that QOS. The QOS will then apply its restrictions to any jobs that use that partition. Rob

[slurm-users] SLURM Telegraf Plugin

2024-09-24 Thread Pablo Collado Soto via slurm-users
Hi all, I recently wrote an SLURM input plugin [0] for Telegraf [1]. I just wanted to let the community know so that you can use it if you'd find that useful. Maybe its existence can also be included in the documentation somewhere? Anyway, thanks a ton fo

[slurm-users] Max TRES per user and node

2024-09-24 Thread Guillaume COCHARD via slurm-users
Hello, We are looking for a method to limit the TRES used by each user on a per-node basis. For example, we would like to limit the total memory allocation of jobs from a user to 200G per node. There is MaxTRESperNode (https://slurm.schedmd.com/sacctmgr.html#OPT_MaxTRESPerNode), but unfortuna