date:20240924

[slurm-users] Re: SLURM Telegraf Plugin

2024-09-24 Thread Oren Shani via slurm-users

Hi Pablo, I did something similar a while back and my problem was that probing the slurm api too often was causing problems for slurm. Didn't you encounter a similar problem? Please let me know Thanks Oren On Tue, Sep 24, 2024 at 4:50 PM Pablo Collado Soto via slurm-users < slurm-users@lists.

[slurm-users] Re: Jobs pending with reason "priority" but nodes are idle

2024-09-24 Thread Long, Daniel S. via slurm-users

The low priority jobs definitely can’t “fit in” before the high priority jobs would start, but I don’t think that should matter. The idle nodes are incapable of running the high priority jobs, ever. I would expect slurm to assign those nodes the highest priority jobs that they are capable of run

[slurm-users] Re: Jobs pending with reason "priority" but nodes are idle

2024-09-24 Thread Paul Edmon via slurm-users

You might need to do some tuning on your backfill loop as that loop should be the one that backfills in those lower priority jobs. I would also look to see if those lower priority jobs will actually fit in prior to the higher priority job running, they may not. -Paul Edmon- On 9/24/24 2:19 P

[slurm-users] Re: Jobs pending with reason "priority" but nodes are idle

2024-09-24 Thread Renfro, Michael via slurm-users

Do you have backfill scheduling [1] enabled? If so, what settings are in place? And the lower-priority jobs will only be eligible for backfill if and only if they don’t delay the start of the higher priority jobs. So what kind of resources and time does a given array job require? Odds are, they

[slurm-users] Re: Jobs pending with reason "priority" but nodes are idle

2024-09-24 Thread Long, Daniel S. via slurm-users

I experimented a bit and think I have figured out the problem but not the solution. We use multifactor priority with the job account the primary factor. Right now one project has much higher priority due to a deadline. Those are the jobs that are pending with “Resources”. They cannot run on the

[slurm-users] Re: Jobs pending with reason "priority" but nodes are idle

2024-09-24 Thread Renfro, Michael via slurm-users

In theory, if jobs are pending with “Priority”, one or more other jobs will be pending with “Resources”. So a few questions: 1. What are the “Resources” jobs waiting on, resource-wise? 2. When are they scheduled to start? 3. Can your array jobs backfill into the idle resources and fini

[slurm-users] Re: Setting up fairshare accounting

2024-09-24 Thread tluchko via slurm-users

Just following up on my own message in case someone else is trying to figure out RawUsage and Fair Share. I ran some additional tests, except that I ran jobs for 10 min instead of 1 min. The procedure was 1. Set the accounting stats to update every minute in slurm.conf PriorityCalcPeriod=1 2.

[slurm-users] Jobs pending with reason "priority" but nodes are idle

2024-09-24 Thread Long, Daniel S. via slurm-users

Hi, On our cluster we have some jobs that are queued even though there are available nodes to run on. The listed reason is "priority" but that doesn't really make sense to me. Slurm isn't picking another job to run on those nodes; it's just not running anything at all. We do have a quite hetero

[slurm-users] Re: Max TRES per user and node

2024-09-24 Thread Groner, Rob via slurm-users

Ok, that example helped. Max of 200G on a single node, per user (not job). No limits on how many jobs and nodes they can use...just a limit of 200G per node per user. And in that case, it's out of my realm of experience. 🙂 I'm relatively confident there IS a way...but I don't know it offha

[slurm-users] Re: Max TRES per user and node

2024-09-24 Thread Guillaume COCHARD via slurm-users

> "So if they submit a 2 nd job, that job can start but will have to go onto > another node, and will again be restricted to 200G? So they can start as many > jobs as there are nodes, and each job will be restricted to using 1 node and > 200G of memory?" Yes that's it. We already have MaxNodes

[slurm-users] Re: Max TRES per user and node

2024-09-24 Thread Groner, Rob via slurm-users

Ah, sorry, I didn't catch that from your first post (though you did say it). So, you are trying to limit the user to no more than 200G of memory on a single node? So if they submit a 2nd job, that job can start but will have to go onto another node, and will again be restricted to 200G? So the

[slurm-users] Re: Max TRES per user and node

2024-09-24 Thread Guillaume COCHARD via slurm-users

Thank you for your answer. To test it I tried: sacctmgr update qos normal set maxtresperuser=cpu=2 # Then in slurm.conf PartitionName=test […] qos=normal But then if I submit several 1-cpu jobs only two start and the others stay pending, even though I have several nodes available. So it see

[slurm-users] Re: Max TRES per user and node

2024-09-24 Thread Groner, Rob via slurm-users

You have the right idea. On that same page, you'll find MaxTRESPerUser, as a QOS parameter. You can create a QOS with the restrictions you'd like, and then in the partition definition, you give it that QOS. The QOS will then apply its restrictions to any jobs that use that partition. Rob

[slurm-users] SLURM Telegraf Plugin

2024-09-24 Thread Pablo Collado Soto via slurm-users

Hi all, I recently wrote an SLURM input plugin [0] for Telegraf [1]. I just wanted to let the community know so that you can use it if you'd find that useful. Maybe its existence can also be included in the documentation somewhere? Anyway, thanks a ton fo

[slurm-users] Max TRES per user and node

2024-09-24 Thread Guillaume COCHARD via slurm-users

Hello, We are looking for a method to limit the TRES used by each user on a per-node basis. For example, we would like to limit the total memory allocation of jobs from a user to 200G per node. There is MaxTRESperNode (https://slurm.schedmd.com/sacctmgr.html#OPT_MaxTRESPerNode), but unfortuna

[slurm-users] Re: SLURM Telegraf Plugin

[slurm-users] Re: Jobs pending with reason "priority" but nodes are idle

[slurm-users] Re: Jobs pending with reason "priority" but nodes are idle

[slurm-users] Re: Jobs pending with reason "priority" but nodes are idle

[slurm-users] Re: Jobs pending with reason "priority" but nodes are idle

[slurm-users] Re: Jobs pending with reason "priority" but nodes are idle

[slurm-users] Re: Setting up fairshare accounting

[slurm-users] Jobs pending with reason "priority" but nodes are idle

[slurm-users] Re: Max TRES per user and node

[slurm-users] Re: Max TRES per user and node

[slurm-users] Re: Max TRES per user and node

[slurm-users] Re: Max TRES per user and node

[slurm-users] Re: Max TRES per user and node

[slurm-users] SLURM Telegraf Plugin

[slurm-users] Max TRES per user and node

15 matches

Site Navigation

Mail list logo

Footer information