[slurm-users] SLURM Telegraf Plugin

2024-09-24 Thread Pablo Collado Soto via slurm-users
Hi all,

I recently wrote an SLURM input plugin [0] for Telegraf [1].

I just wanted to let the community know so that you can use it if you'd
find that useful.

Maybe its existence can also be included in the documentation somewhere?

Anyway, thanks a ton for your time,

Pablo Collado Soto

References:
0: 
https://github.com/influxdata/telegraf/tree/master/plugins/inputs/slurm
1: https://www.influxdata.com/time-series-platform/telegraf/

+ -- +
| Never let your sense of morals prevent |
| you from doing what is right.  |
| -- Salvor Hardin, "Foundation" |
+ -- +



signature.asc
Description: Message signed with OpenPGP

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Max TRES per user and node

2024-09-24 Thread Groner, Rob via slurm-users
You have the right idea.

On that same page, you'll find MaxTRESPerUser, as a QOS parameter.

You can create a QOS with the restrictions you'd like, and then in the 
partition definition, you give it that QOS.  The QOS will then apply its 
restrictions to any jobs that use that partition.

Rob

From: Guillaume COCHARD via slurm-users 
Sent: Tuesday, September 24, 2024 9:30 AM
To: slurm-users@lists.schedmd.com 
Subject: [slurm-users] Max TRES per user and node

Hello,

We are looking for a method to limit the TRES used by each user on a per-node 
basis. For example, we would like to limit the total memory allocation of jobs 
from a user to 200G per node.

There is MaxTRESperNode 
(https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fsacctmgr.html%23OPT_MaxTRESPerNode&data=05%7C02%7Crug262%40psu.edu%7Ca5ac74d119fb4b1e2a6a08dcdc9d71f4%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638627815993703402%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=ovXl4if01XtEDBQy3GxOG%2BrpH1GiDYFEOjNtz7gpkUs%3D&reserved=0),
 but unfortunately, this is a per-job limit, not per user.

Ideally, we would like to apply this limit on partitions and/or QoS. Does 
anyone know if this is possible and how to achieve it?

Thank you,

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Max TRES per user and node

2024-09-24 Thread Groner, Rob via slurm-users
Ok, that example helped.  Max of 200G on a single node, per user (not job).  No 
limits on how many jobs and nodes they can use...just a limit of 200G per node 
per user.

And in that case, it's out of my realm of experience.  🙂   I'm relatively 
confident there IS a way...but I don't know it offhand.  i was thinking maybe 
some combination of qos and partition and account limits

Rob


From: Guillaume COCHARD 
Sent: Tuesday, September 24, 2024 10:58 AM
To: Groner, Rob 
Cc: slurm-users@lists.schedmd.com 
Subject: Re: Max TRES per user and node

> "So if they submit a 2nd job, that job can start but will have to go onto 
> another node, and will again be restricted to 200G?  So they can start as 
> many jobs as there are nodes, and each job will be restricted to using 1 node 
> and 200G of memory?"

Yes that's it. We already have MaxNodes=1 so a job can't be spread on multiple 
nodes.

To be more precise, the limit should be by user and not by job. To illustrate, 
let's imagine we have 3 empty nodes and a 200G/user/node limit. If a user 
submit 10 jobs each requesting 100G of memory, there should be 2 jobs running 
on each worker and 4 jobs pending.

Guillaume


De: "Groner, Rob" 
Ă€: "Guillaume COCHARD" 
Cc: slurm-users@lists.schedmd.com
Envoyé: Mardi 24 Septembre 2024 16:37:34
Objet: Re: Max TRES per user and node

Ah, sorry, I didn't catch that from your first post (though you did say it).

So, you are trying to limit the user to no more than 200G of memory on a single 
node?  So if they submit a 2nd job, that job can start but will have to go onto 
another node, and will again be restricted to 200G?  So they can start as many 
jobs as there are nodes, and each job will be restricted to using 1 node and 
200G of memory? Or can they submit a job asking for 4 nodes, where they are 
limited to 200G on each node?  Or are they limited to a single node, no matter 
how many jobs?

Rob


From: Guillaume COCHARD 
Sent: Tuesday, September 24, 2024 10:09 AM
To: Groner, Rob 
Cc: slurm-users@lists.schedmd.com 
Subject: Re: Max TRES per user and node

Thank you for your answer.

To test it I tried:
sacctmgr update qos normal set maxtresperuser=cpu=2
# Then in slurm.conf
PartitionName=test […] qos=normal

But then if I submit several 1-cpu jobs only two start and the others stay 
pending, even though I have several nodes available. So it seems that 
MaxTRESPerUser is a QoS-wide limit, and doesn't limit TRES per user and per 
node but rather per user and QoS (or rather partition since I applied the QoS 
on the partition). Did I miss something?

Thanks again,
Guillaume


De: "Groner, Rob" 
Ă€: slurm-users@lists.schedmd.com, "Guillaume COCHARD" 

Envoyé: Mardi 24 Septembre 2024 15:45:08
Objet: Re: Max TRES per user and node

You have the right idea.

On that same page, you'll find MaxTRESPerUser, as a QOS parameter.

You can create a QOS with the restrictions you'd like, and then in the 
partition definition, you give it that QOS.  The QOS will then apply its 
restrictions to any jobs that use that partition.

Rob

From: Guillaume COCHARD via slurm-users 
Sent: Tuesday, September 24, 2024 9:30 AM
To: slurm-users@lists.schedmd.com 
Subject: [slurm-users] Max TRES per user and node

Hello,

We are looking for a method to limit the TRES used by each user on a per-node 
basis. For example, we would like to limit the total memory allocation of jobs 
from a user to 200G per node.

There is MaxTRESperNode 
(https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fsacctmgr.html%23OPT_MaxTRESPerNode&data=05%7C02%7Crug262%40psu.edu%7Ca5ac74d119fb4b1e2a6a08dcdc9d71f4%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638627815993703402%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=ovXl4if01XtEDBQy3GxOG%2BrpH1GiDYFEOjNtz7gpkUs%3D&reserved=0),
 but unfortunately, this is a per-job limit, not per user.

Ideally, we would like to apply this limit on partitions and/or QoS. Does 
anyone know if this is possible and how to achieve it?

Thank you,

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com



-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Jobs pending with reason "priority" but nodes are idle

2024-09-24 Thread Renfro, Michael via slurm-users
Do you have backfill scheduling [1] enabled? If so, what settings are in place?

And the lower-priority jobs will only be eligible for backfill if and only if 
they don’t delay the start of the higher priority jobs.

So what kind of resources and time does a given array job require? Odds are, 
they have a time request that conflicts with the scheduled start time for the 
high priority jobs.

[1] https://slurm.schedmd.com/sched_config.html#backfill

From: Long, Daniel S. 
Date: Tuesday, September 24, 2024 at 1:20 PM
To: Renfro, Michael , slurm-us...@schedmd.com 

Subject: Re: Jobs pending with reason "priority" but nodes are idle

External Email Warning

This email originated from outside the university. Please use caution when 
opening attachments, clicking links, or responding to requests.


I experimented a bit and think I have figured out the problem but not the 
solution.

We use multifactor priority with the job account the primary factor. Right now 
one project has much higher priority due to a deadline. Those are the jobs that 
are pending with “Resources”. They cannot run on the idle nodes because they do 
not satisfy the resource requirements (don’t have GPUs). What I don’t 
understand is why slurm doesn’t schedule the lower priority jobs onto those 
nodes, since those jobs don’t require GPUs. It’s very unexpected behavior, to 
me. Is there an option somewhere I need to set?


From: "Renfro, Michael" 
Date: Tuesday, September 24, 2024 at 1:54 PM
To: Daniel Long , "slurm-us...@schedmd.com" 

Subject: Re: Jobs pending with reason "priority" but nodes are idle

In theory, if jobs are pending with “Priority”, one or more other jobs will be 
pending with “Resources”.

So a few questions:


  1.  What are the “Resources” jobs waiting on, resource-wise?
  2.  When are they scheduled to start?
  3.  Can your array jobs backfill into the idle resources and finish before 
the “Resources” jobs are scheduled to start?

From: Long, Daniel S. via slurm-users 
Date: Tuesday, September 24, 2024 at 11:47 AM
To: slurm-us...@schedmd.com 
Subject: [slurm-users] Jobs pending with reason "priority" but nodes are idle

External Email Warning

This email originated from outside the university. Please use caution when 
opening attachments, clicking links, or responding to requests.


Hi,

On our cluster we have some jobs that are queued even though there are 
available nodes to run on. The listed reason is “priority” but that doesn’t 
really make sense to me. Slurm isn’t picking another job to run on those nodes; 
it’s just not running anything at all. We do have a quite heterogeneous 
cluster, but as far as I can tell the queued jobs aren’t requesting anything 
that would preclude them from running on the idle nodes. They are array jobs, 
if that makes a difference.

Thanks for any help you all can provide.

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Jobs pending with reason "priority" but nodes are idle

2024-09-24 Thread Paul Edmon via slurm-users
You might need to do some tuning on your backfill loop as that loop 
should be the one that backfills in those lower priority jobs.  I would 
also look to see if those lower priority jobs will actually fit in prior 
to the higher priority job running, they may not.


-Paul Edmon-

On 9/24/24 2:19 PM, Long, Daniel S. via slurm-users wrote:


I experimented a bit and think I have figured out the problem but not 
the solution.


We use multifactor priority with the job account the primary factor. 
Right now one project has much higher priority due to a deadline. 
Those are the jobs that are pending with “Resources”. They cannot run 
on the idle nodes because they do not satisfy the resource 
requirements (don’t have GPUs). What I don’t understand is why slurm 
doesn’t schedule the lower priority jobs onto those nodes, since those 
jobs don’t require GPUs. It’s very unexpected behavior, to me. Is 
there an option somewhere I need to set?


*From: *"Renfro, Michael" 
*Date: *Tuesday, September 24, 2024 at 1:54 PM
*To: *Daniel Long , 
"slurm-us...@schedmd.com" 

*Subject: *Re: Jobs pending with reason "priority" but nodes are idle

In theory, if jobs are pending with “Priority”, one or more other jobs 
will be pending with “Resources”.


So a few questions:

 1. What are the “Resources” jobs waiting on, resource-wise?
 2. When are they scheduled to start?
 3. Can your array jobs backfill into the idle resources and finish
before the “Resources” jobs are scheduled to start?

*From: *Long, Daniel S. via slurm-users 
*Date: *Tuesday, September 24, 2024 at 11:47 AM
*To: *slurm-us...@schedmd.com 
*Subject: *[slurm-users] Jobs pending with reason "priority" but nodes 
are idle


*External Email Warning*

*This email originated from outside the university. Please use caution 
when opening attachments, clicking links, or responding to requests.*




Hi,

On our cluster we have some jobs that are queued even though there are 
available nodes to run on. The listed reason is “priority” but that 
doesn’t really make sense to me. Slurm isn’t picking another job to 
run on those nodes; it’s just not running anything at all. We do have 
a quite heterogeneous cluster, but as far as I can tell the queued 
jobs aren’t requesting anything that would preclude them from running 
on the idle nodes. They are array jobs, if that makes a difference.


Thanks for any help you all can provide.


-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Max TRES per user and node

2024-09-24 Thread Guillaume COCHARD via slurm-users
Hello,

We are looking for a method to limit the TRES used by each user on a per-node 
basis. For example, we would like to limit the total memory allocation of jobs 
from a user to 200G per node.

There is MaxTRESperNode 
(https://slurm.schedmd.com/sacctmgr.html#OPT_MaxTRESPerNode), but 
unfortunately, this is a per-job limit, not per user.

Ideally, we would like to apply this limit on partitions and/or QoS. Does 
anyone know if this is possible and how to achieve it?

Thank you,

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Jobs pending with reason "priority" but nodes are idle

2024-09-24 Thread Long, Daniel S. via slurm-users
The low priority jobs definitely can’t “fit in” before the high priority jobs 
would start, but I don’t think that should matter. The idle nodes are incapable 
of running the high priority jobs, ever. I would expect slurm to assign those 
nodes the highest priority jobs that they are capable of running.


From: Paul Edmon via slurm-users 
Reply-To: Paul Edmon 
Date: Tuesday, September 24, 2024 at 2:26 PM
To: "slurm-users@lists.schedmd.com" 
Subject: [slurm-users] Re: Jobs pending with reason "priority" but nodes are 
idle


You might need to do some tuning on your backfill loop as that loop should be 
the one that backfills in those lower priority jobs.  I would also look to see 
if those lower priority jobs will actually fit in prior to the higher priority 
job running, they may not.

-Paul Edmon-
On 9/24/24 2:19 PM, Long, Daniel S. via slurm-users wrote:
I experimented a bit and think I have figured out the problem but not the 
solution.

We use multifactor priority with the job account the primary factor. Right now 
one project has much higher priority due to a deadline. Those are the jobs that 
are pending with “Resources”. They cannot run on the idle nodes because they do 
not satisfy the resource requirements (don’t have GPUs). What I don’t 
understand is why slurm doesn’t schedule the lower priority jobs onto those 
nodes, since those jobs don’t require GPUs. It’s very unexpected behavior, to 
me. Is there an option somewhere I need to set?


From: "Renfro, Michael" 
Date: Tuesday, September 24, 2024 at 1:54 PM
To: Daniel Long 
, 
"slurm-us...@schedmd.com" 

Subject: Re: Jobs pending with reason "priority" but nodes are idle

In theory, if jobs are pending with “Priority”, one or more other jobs will be 
pending with “Resources”.

So a few questions:


  1.  What are the “Resources” jobs waiting on, resource-wise?
  2.  When are they scheduled to start?
  3.  Can your array jobs backfill into the idle resources and finish before 
the “Resources” jobs are scheduled to start?

From: Long, Daniel S. via slurm-users 

Date: Tuesday, September 24, 2024 at 11:47 AM
To: slurm-us...@schedmd.com 

Subject: [slurm-users] Jobs pending with reason "priority" but nodes are idle

External Email Warning

This email originated from outside the university. Please use caution when 
opening attachments, clicking links, or responding to requests.


Hi,

On our cluster we have some jobs that are queued even though there are 
available nodes to run on. The listed reason is “priority” but that doesn’t 
really make sense to me. Slurm isn’t picking another job to run on those nodes; 
it’s just not running anything at all. We do have a quite heterogeneous 
cluster, but as far as I can tell the queued jobs aren’t requesting anything 
that would preclude them from running on the idle nodes. They are array jobs, 
if that makes a difference.

Thanks for any help you all can provide.





-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Max TRES per user and node

2024-09-24 Thread Groner, Rob via slurm-users
Ah, sorry, I didn't catch that from your first post (though you did say it).

So, you are trying to limit the user to no more than 200G of memory on a single 
node?  So if they submit a 2nd job, that job can start but will have to go onto 
another node, and will again be restricted to 200G?  So they can start as many 
jobs as there are nodes, and each job will be restricted to using 1 node and 
200G of memory? Or can they submit a job asking for 4 nodes, where they are 
limited to 200G on each node?  Or are they limited to a single node, no matter 
how many jobs?

Rob


From: Guillaume COCHARD 
Sent: Tuesday, September 24, 2024 10:09 AM
To: Groner, Rob 
Cc: slurm-users@lists.schedmd.com 
Subject: Re: Max TRES per user and node

Thank you for your answer.

To test it I tried:
sacctmgr update qos normal set maxtresperuser=cpu=2
# Then in slurm.conf
PartitionName=test […] qos=normal

But then if I submit several 1-cpu jobs only two start and the others stay 
pending, even though I have several nodes available. So it seems that 
MaxTRESPerUser is a QoS-wide limit, and doesn't limit TRES per user and per 
node but rather per user and QoS (or rather partition since I applied the QoS 
on the partition). Did I miss something?

Thanks again,
Guillaume


De: "Groner, Rob" 
Ă€: slurm-users@lists.schedmd.com, "Guillaume COCHARD" 

Envoyé: Mardi 24 Septembre 2024 15:45:08
Objet: Re: Max TRES per user and node

You have the right idea.

On that same page, you'll find MaxTRESPerUser, as a QOS parameter.

You can create a QOS with the restrictions you'd like, and then in the 
partition definition, you give it that QOS.  The QOS will then apply its 
restrictions to any jobs that use that partition.

Rob

From: Guillaume COCHARD via slurm-users 
Sent: Tuesday, September 24, 2024 9:30 AM
To: slurm-users@lists.schedmd.com 
Subject: [slurm-users] Max TRES per user and node

Hello,

We are looking for a method to limit the TRES used by each user on a per-node 
basis. For example, we would like to limit the total memory allocation of jobs 
from a user to 200G per node.

There is MaxTRESperNode 
(https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fsacctmgr.html%23OPT_MaxTRESPerNode&data=05%7C02%7Crug262%40psu.edu%7Ca5ac74d119fb4b1e2a6a08dcdc9d71f4%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638627815993703402%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=ovXl4if01XtEDBQy3GxOG%2BrpH1GiDYFEOjNtz7gpkUs%3D&reserved=0),
 but unfortunately, this is a per-job limit, not per user.

Ideally, we would like to apply this limit on partitions and/or QoS. Does 
anyone know if this is possible and how to achieve it?

Thank you,

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Max TRES per user and node

2024-09-24 Thread Guillaume COCHARD via slurm-users
> "So if they submit a 2 nd job, that job can start but will have to go onto 
> another node, and will again be restricted to 200G? So they can start as many 
> jobs as there are nodes, and each job will be restricted to using 1 node and 
> 200G of memory?" 

Yes that's it. We already have MaxNodes=1 so a job can't be spread on multiple 
nodes. 

To be more precise, the limit should be by user and not by job. To illustrate, 
let's imagine we have 3 empty nodes and a 200G/user/node limit. If a user 
submit 10 jobs each requesting 100G of memory, there should be 2 jobs running 
on each worker and 4 jobs pending. 

Guillaume 


De: "Groner, Rob"  
Ă€: "Guillaume COCHARD"  
Cc: slurm-users@lists.schedmd.com 
Envoyé: Mardi 24 Septembre 2024 16:37:34 
Objet: Re: Max TRES per user and node 

Ah, sorry, I didn't catch that from your first post (though you did say it). 

So, you are trying to limit the user to no more than 200G of memory on a single 
node? So if they submit a 2 nd job, that job can start but will have to go onto 
another node, and will again be restricted to 200G? So they can start as many 
jobs as there are nodes, and each job will be restricted to using 1 node and 
200G of memory? Or can they submit a job asking for 4 nodes, where they are 
limited to 200G on each node? Or are they limited to a single node, no matter 
how many jobs? 

Rob 


From: Guillaume COCHARD  
Sent: Tuesday, September 24, 2024 10:09 AM 
To: Groner, Rob  
Cc: slurm-users@lists.schedmd.com  
Subject: Re: Max TRES per user and node 
Thank you for your answer. 

To test it I tried: 
sacctmgr update qos normal set maxtresperuser=cpu=2 
# Then in slurm.conf 
PartitionName=test […] qos=normal 

But then if I submit several 1-cpu jobs only two start and the others stay 
pending, even though I have several nodes available. So it seems that 
MaxTRESPerUser is a QoS-wide limit, and doesn't limit TRES per user and per 
node but rather per user and QoS (or rather partition since I applied the QoS 
on the partition). Did I miss something? 

Thanks again, 
Guillaume 


De: "Groner, Rob"  
Ă€: slurm-users@lists.schedmd.com, "Guillaume COCHARD" 
 
Envoyé: Mardi 24 Septembre 2024 15:45:08 
Objet: Re: Max TRES per user and node 

You have the right idea. 

On that same page, you'll find MaxTRESPerUser, as a QOS parameter. 

You can create a QOS with the restrictions you'd like, and then in the 
partition definition, you give it that QOS. The QOS will then apply its 
restrictions to any jobs that use that partition. 

Rob 

From: Guillaume COCHARD via slurm-users  
Sent: Tuesday, September 24, 2024 9:30 AM 
To: slurm-users@lists.schedmd.com  
Subject: [slurm-users] Max TRES per user and node 
Hello, 

We are looking for a method to limit the TRES used by each user on a per-node 
basis. For example, we would like to limit the total memory allocation of jobs 
from a user to 200G per node. 

There is MaxTRESperNode ( [ 
https://slurm.schedmd.com/sacctmgr.html#OPT_MaxTRESPerNode | 
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fsacctmgr.html%23OPT_MaxTRESPerNode&data=05%7C02%7Crug262%40psu.edu%7Ca5ac74d119fb4b1e2a6a08dcdc9d71f4%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638627815993703402%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=ovXl4if01XtEDBQy3GxOG%2BrpH1GiDYFEOjNtz7gpkUs%3D&reserved=0
 ] ), but unfortunately, this is a per-job limit, not per user. 

Ideally, we would like to apply this limit on partitions and/or QoS. Does 
anyone know if this is possible and how to achieve it? 

Thank you, 

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com 
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com 



-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Jobs pending with reason "priority" but nodes are idle

2024-09-24 Thread Renfro, Michael via slurm-users
In theory, if jobs are pending with “Priority”, one or more other jobs will be 
pending with “Resources”.

So a few questions:


  1.  What are the “Resources” jobs waiting on, resource-wise?
  2.  When are they scheduled to start?
  3.  Can your array jobs backfill into the idle resources and finish before 
the “Resources” jobs are scheduled to start?

From: Long, Daniel S. via slurm-users 
Date: Tuesday, September 24, 2024 at 11:47 AM
To: slurm-us...@schedmd.com 
Subject: [slurm-users] Jobs pending with reason "priority" but nodes are idle

External Email Warning

This email originated from outside the university. Please use caution when 
opening attachments, clicking links, or responding to requests.


Hi,

On our cluster we have some jobs that are queued even though there are 
available nodes to run on. The listed reason is “priority” but that doesn’t 
really make sense to me. Slurm isn’t picking another job to run on those nodes; 
it’s just not running anything at all. We do have a quite heterogeneous 
cluster, but as far as I can tell the queued jobs aren’t requesting anything 
that would preclude them from running on the idle nodes. They are array jobs, 
if that makes a difference.

Thanks for any help you all can provide.

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Jobs pending with reason "priority" but nodes are idle

2024-09-24 Thread Long, Daniel S. via slurm-users
I experimented a bit and think I have figured out the problem but not the 
solution.

We use multifactor priority with the job account the primary factor. Right now 
one project has much higher priority due to a deadline. Those are the jobs that 
are pending with “Resources”. They cannot run on the idle nodes because they do 
not satisfy the resource requirements (don’t have GPUs). What I don’t 
understand is why slurm doesn’t schedule the lower priority jobs onto those 
nodes, since those jobs don’t require GPUs. It’s very unexpected behavior, to 
me. Is there an option somewhere I need to set?


From: "Renfro, Michael" 
Date: Tuesday, September 24, 2024 at 1:54 PM
To: Daniel Long , "slurm-us...@schedmd.com" 

Subject: Re: Jobs pending with reason "priority" but nodes are idle

In theory, if jobs are pending with “Priority”, one or more other jobs will be 
pending with “Resources”.

So a few questions:


  1.  What are the “Resources” jobs waiting on, resource-wise?
  2.  When are they scheduled to start?
  3.  Can your array jobs backfill into the idle resources and finish before 
the “Resources” jobs are scheduled to start?

From: Long, Daniel S. via slurm-users 
Date: Tuesday, September 24, 2024 at 11:47 AM
To: slurm-us...@schedmd.com 
Subject: [slurm-users] Jobs pending with reason "priority" but nodes are idle

External Email Warning

This email originated from outside the university. Please use caution when 
opening attachments, clicking links, or responding to requests.


Hi,

On our cluster we have some jobs that are queued even though there are 
available nodes to run on. The listed reason is “priority” but that doesn’t 
really make sense to me. Slurm isn’t picking another job to run on those nodes; 
it’s just not running anything at all. We do have a quite heterogeneous 
cluster, but as far as I can tell the queued jobs aren’t requesting anything 
that would preclude them from running on the idle nodes. They are array jobs, 
if that makes a difference.

Thanks for any help you all can provide.

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Jobs pending with reason "priority" but nodes are idle

2024-09-24 Thread Long, Daniel S. via slurm-users
Hi,

On our cluster we have some jobs that are queued even though there are 
available nodes to run on. The listed reason is "priority" but that doesn't 
really make sense to me. Slurm isn't picking another job to run on those nodes; 
it's just not running anything at all. We do have a quite heterogeneous 
cluster, but as far as I can tell the queued jobs aren't requesting anything 
that would preclude them from running on the idle nodes. They are array jobs, 
if that makes a difference.

Thanks for any help you all can provide.

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Max TRES per user and node

2024-09-24 Thread Guillaume COCHARD via slurm-users
Thank you for your answer. 

To test it I tried: 
sacctmgr update qos normal set maxtresperuser=cpu=2 
# Then in slurm.conf 
PartitionName=test […] qos=normal 

But then if I submit several 1-cpu jobs only two start and the others stay 
pending, even though I have several nodes available. So it seems that 
MaxTRESPerUser is a QoS-wide limit, and doesn't limit TRES per user and per 
node but rather per user and QoS (or rather partition since I applied the QoS 
on the partition). Did I miss something? 

Thanks again, 
Guillaume 


De: "Groner, Rob"  
Ă€: slurm-users@lists.schedmd.com, "Guillaume COCHARD" 
 
Envoyé: Mardi 24 Septembre 2024 15:45:08 
Objet: Re: Max TRES per user and node 

You have the right idea. 

On that same page, you'll find MaxTRESPerUser, as a QOS parameter. 

You can create a QOS with the restrictions you'd like, and then in the 
partition definition, you give it that QOS. The QOS will then apply its 
restrictions to any jobs that use that partition. 

Rob 

From: Guillaume COCHARD via slurm-users  
Sent: Tuesday, September 24, 2024 9:30 AM 
To: slurm-users@lists.schedmd.com  
Subject: [slurm-users] Max TRES per user and node 
Hello, 

We are looking for a method to limit the TRES used by each user on a per-node 
basis. For example, we would like to limit the total memory allocation of jobs 
from a user to 200G per node. 

There is MaxTRESperNode ( [ 
https://slurm.schedmd.com/sacctmgr.html#OPT_MaxTRESPerNode | 
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fsacctmgr.html%23OPT_MaxTRESPerNode&data=05%7C02%7Crug262%40psu.edu%7Ca5ac74d119fb4b1e2a6a08dcdc9d71f4%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638627815993703402%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=ovXl4if01XtEDBQy3GxOG%2BrpH1GiDYFEOjNtz7gpkUs%3D&reserved=0
 ] ), but unfortunately, this is a per-job limit, not per user. 

Ideally, we would like to apply this limit on partitions and/or QoS. Does 
anyone know if this is possible and how to achieve it? 

Thank you, 

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com 
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com 


-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Setting up fairshare accounting

2024-09-24 Thread tluchko via slurm-users
Just following up on my own message in case someone else is trying to figure 
out RawUsage and Fair Share.

I ran some additional tests, except that I ran jobs for 10 min instead of 1 
min. The procedure was

1. Set the accounting stats to update every minute in slurm.conf

PriorityCalcPeriod=1

2. Reset the RawUsage stat

sacctmgr modify account luchko_group set RawUsage=0

3. Check the RawUsage every second

while sleep 1; do date; sshare -ao Account,User,RawShares,NormShares,RawUsage ; 
done > watch.out

4. Run a 10 min job. The billing per CPU is 1, so the total RawUsage should 
60,000 and the RawUsage should increase 6,000 each minute

sbatch --account=luchko_group --wrap="sleep 600" -p cpu -n 100

Scanning the output file, I can see that the RawUsage does update once every 
minute. Below are the updates. (I've removed irrelevant output.)

Tue Sep 24 10:14:24 AM PDT 2024
Account User RawShares NormShares RawUsage
 -- -- --- ---
luchko_group tluchko 100 0.50 0

Tue Sep 24 10:14:25 AM PDT 2024
luchko_group tluchko 100 0.50 4099
Tue Sep 24 10:15:24 AM PDT 2024
luchko_group tluchko 100 0.50 10099Tue Sep 24 10:16:25 AM PDT 2024
luchko_group tluchko 100 0.50 16099
Tue Sep 24 10:17:24 AM PDT 2024
luchko_group tluchko 100 0.50 22098

Tue Sep 24 10:18:25 AM PDT 2024

luchko_group tluchko 100 0.50 28097

Tue Sep 24 10:19:24 AM PDT 2024

luchko_group tluchko 100 0.50 34096

Tue Sep 24 10:20:25 AM PDT 2024

luchko_group tluchko 100 0.50 40094

Tue Sep 24 10:21:24 AM PDT 2024

luchko_group tluchko 100 0.50 46093

Tue Sep 24 10:22:25 AM PDT 2024

luchko_group tluchko 100 0.50 52091

Tue Sep 24 10:23:24 AM PDT 2024

luchko_group tluchko 100 0.50 58089

Tue Sep 24 10:24:25 AM PDT 2024

luchko_group 2000 0.133324 58087

Tue Sep 24 10:25:25 AM PDT 2024

luchko_group tluchko 100 0.50 58085

So, the RawUsage does increase by the expected amount each minute, and the 
RawUsage does decay (I have the half-life set to 14 days). However, the update 
for the last part of a minute, which should be 1901, is not recorded. I suspect 
this is because the job is no longer running when the accounting update occurs.

For typical jobs that run for hours or days, this is a negligible error, but it 
does explain the results I got when I ran a 1 min job.

TRESRunMins is still not updating, but this is an inconvenience.

Tyler

Sent with [Proton Mail](https://proton.me/mail/home) secure email.

On Thursday, September 19th, 2024 at 8:47 PM, tluchko via slurm-users 
 wrote:

> Hello,
>
> I'm hoping someone can offer some suggestions.
>
> I went ahead started the database from scratch and reinitialized it to see if 
> that would help and to try and understand how RawUsage is calculated. I ran 
> two jobs of
>
> sbatch --account=luchko_group --wrap="sleep 60" -p cpu -n 100
>
> With the partition defined as
>
> PriorityFlags=MAX_TRES
> PartitionName=cpu Nodes=node[1-7] MaxCPUsPerNode=182 MaxTime=7-0:00:00 
> State=UP TRESBillingWeights="CPU=1.0,MEM=0.125G,GRES/gpu=9.6"
>
> I expected each job to contribute 6000 to the RawUsage, however one job 
> contributed 3100 and the other 2800. And TRESRunMins stayed at 0 for all 
> categories.
>
> I'm at a loss as to what is going on.
>
> Thank you,
>
> Tyler
>
> Sent with [Proton Mail](https://proton.me/mail/home) secure email.
>
> On Tuesday, September 10th, 2024 at 9:03 PM, tluchko  
> wrote:
>
>> Hello,
>>
>> We have a new cluster and I'm trying to setup fairshare accounting. I'm 
>> trying to track CPU, MEM and GPU. It seems that billing for individual jobs 
>> is correct, but billing isn't being accumulated (TRESRunMin is always 0).
>>
>> In my slurm.conf, I think the relevant lines are
>>
>> AccountingStorageType=accounting_storage/slurmdbd
>> AccountingStorageTRES=gres/gpu
>> PriorityFlags=MAX_TRES
>>
>> PartitionName=gpu Nodes=node[1-7] MaxCPUsPerNode=384 MaxTime=7-0:00:00 
>> State=UP TRESBillingWeights="CPU=1.0,MEM=0.125G,GRES/gpu=9.6"
>> PartitionName=cpu Nodes=node[1-7] MaxCPUsPerNode=182 MaxTime=7-0:00:00 
>> State=UP TRESBillingWeights="CPU=1.0,MEM=0.125G,GRES/gpu=9.6"
>> I currently have one recently finished job and one running job. sacct gives
>>
>> $ sacct 
>> --format=JobID,JobName,ReqTRES%50,AllocTRES%50,TRESUsageInAve%50,TRESUsageInMax%50
>> JobID JobName ReqTRES AllocTRES TRESUsageInAve TRESUsageInMax
>>  -- -- 
>> -- 
>> -- 
>> --
>> 154 interacti+ billing=9,cpu=1,gres/gpu=1,mem=1G,node=1 
>> billing=9,cpu=2,gres/gpu=1,mem=2G,node=1
>> 154.interac+ interacti+ cpu=2,gres/gpu=1,mem=2G,node=1 
>> cpu=00:00:00,energy=0,fs/disk=2480503,mem=3M,page+ 
>> cpu=00:00:00,energy=0,fs/disk=2480503,mem=3M,page+
>> 155 interacti+ billing=9,cpu=1,gres/gpu=1,mem=1G,node=1 
>> billing=9,cpu=2,

[slurm-users] Re: SLURM Telegraf Plugin

2024-09-24 Thread Oren Shani via slurm-users
Hi Pablo,

I did something similar a while back and my problem was that probing the
slurm api too often was causing problems for slurm.

Didn't you encounter a similar problem?

Please let me know

Thanks

Oren

On Tue, Sep 24, 2024 at 4:50 PM Pablo Collado Soto via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Hi all,
>
> I recently wrote an SLURM input plugin [0] for Telegraf [1].
>
> I just wanted to let the community know so that you can use it if
> you'd
> find that useful.
>
> Maybe its existence can also be included in the documentation
> somewhere?
>
> Anyway, thanks a ton for your time,
>
> Pablo Collado Soto
>
> References:
> 0:
> https://github.com/influxdata/telegraf/tree/master/plugins/inputs/slurm
> 1: https://www.influxdata.com/time-series-platform/telegraf/
>
> + -- +
> | Never let your sense of morals prevent |
> | you from doing what is right.  |
> | -- Salvor Hardin, "Foundation" |
> + -- +
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com