[slurm-users] visualisation of JobComp and JobacctGather data with Grafana - screenshots, ideas?

2024-04-10 Thread Josef Dvoracek via slurm-users
Is here anybody having nice visualization of JobComp and JobacctGather 
data in Grafana?


I save JobComp data in Elasticsearch, JobacctGather data in influxDB, 
and thinking about how to provide meaningful insights to $users.


Things I'd like to show..: especially memory & cpu utilization, job 
result, possible malicious effects like OOMs...


Any screenshots, ideas, experience welcomed!

cheers

Josef




smime.p7s
Description: S/MIME Cryptographic Signature

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Avoiding fragmentation

2024-04-10 Thread Williams, Jenny Avis via slurm-users
Various options that might help reduce job fragmentation.

Turn up debugging on slurmctld and add the DebugFlags like TraceJobs, 
SelectType, and Steps. With debugging set high enough one can see a good bit of 
the logic in regard to node selection.  

  CR_LLN Schedule  resources  to  jobs  on  the  least loaded nodes
 (based upon the number of idle CPUs).  This  is  generally
 only  recommended  for  an environment with serial jobs as
 idle resources will tend to be highly fragmented,  result-
 ing  in parallel jobs being distributed across many nodes.
 Note that node Weight takes precedence over how many  idle
 resources  are  on each node.  Also see the partition con-
 figuration parameter LLN use the  least  loaded  nodes  in
 selected partitions.

Explore node weights.  If your nodes are not identical apply node 
weights to sort your nodes in the order of how you wish them to be selected; on 
the other hand, even for homogenous nodes you might try sets of weights to have 
the scheduler within a given scheduling cycle consider a smaller number of 
nodes of a weight before then considering the next number of nodes of the next 
weight. The number of nodes within a weight set might be no smaller than 1/3 or 
1/4 of the total partition size.  YMMV based on for instance ratio of serial 
jobs to MPI jobs, job length, etc. I have seen evidence that node allocation 
progresses roughly this way.

Turn on backfill and educate users to better fit both their job 
resource requirements and the job runtime.   This will allow backfill to work 
more efficiently. Note that backfill choices are made within a given set of job 
within a partition. 


  CR_Pack_Nodes
 If  a  job allocation contains more resources than will be
 used for launching tasks (e.g. if whole  nodes  are  allo-
 cated  to  a  job),  then rather than distributing a job's
 tasks evenly across its  allocated  nodes,  pack  them  as
 tightly as possible on these nodes.  For example, consider
 a job allocation containing two entire  nodes  with  eight
 CPUs  each.   If the job starts ten tasks across those two
 nodes without this option, it will  start  five  tasks  on
 each of the two nodes.  With this option, eight tasks will
 be started on the first node and two tasks on  the  second
 node.   This  can  be  superseded  by  "NoPack"  in srun's
 "--distribution" option.  CR_Pack_Nodes only applies  when
 the "block" task distribution method is used.

  pack_serial_at_end
 If used with the select/cons_res or select/cons_tres plug-
 in, then put serial jobs at the end of the available nodes
 rather than using a best fit algorithm.  This  may  reduce
 resource fragmentation for some workloads.

  reduce_completing_frag
 This option is used to control how scheduling of resources
 is  performed when jobs are in the COMPLETING state, which
 influences potential fragmentation.  If this option is not
 set then no jobs will be started in any partition when any
 job is in the COMPLETING state for less than  CompleteWait
 seconds.   If  this  option  is  set  then no jobs will be
 started in any individual partition that has a job in COM-
 PLETING  state  for  less  than  CompleteWait seconds.  In
 addition, no jobs will be started in  any  partition  with
 nodes  that overlap with any nodes in the partition of the
 completing job.  This option is to be used in  conjunction
 with CompleteWait.

-Original Message-
From: Gerhard Strangar via slurm-users  
Sent: Tuesday, April 9, 2024 12:53 AM
To: slurm-users@lists.schedmd.com
Subject: [slurm-users] Avoiding fragmentation

Hi,

I'm trying to figure out how to deal with a mix of few- and many-cpu jobs. By 
that I mean most jobs use 128 cpus, but sometimes there are jobs with only 16. 
As soon as that job with only 16 is running, the scheduler splits the next 128 
cpu jobs into 96+16 each, instead of assigning a full 128 cpu node to them. Is 
there a way for the administrator to achieve preferring full nodes?
The existence of pack_serial_at_end makes me believe there is not, because that 
basically is what I needed, apart from my serial jobs using
16 cpus instead of 1.

Gerhard

--
slurm-users mailing list

[slurm-users] Re: single node configuration

2024-04-10 Thread Steffen Grunewald via slurm-users
On Tue, 2024-04-09 at 11:07:32 -0700, Slurm users wrote:
> Hi everyone, I'm conducting some tests. I've just set up SLURM on the head
> node and haven't added any compute nodes yet. I'm trying to test it to
> ensure it's working, but I'm encountering an error: 'Nodes required for the
> job are DOWN, DRAINED, or reserved for jobs in higher priority partitions.
> 
> *[stsadmin@head ~]$ squeue*
>  JOBID PARTITION NAME USER ST   TIME  NODES
> NODELIST(REASON)
>  6   lab test_slu stsadmin PD   0:00  1 (Nodes
> required for job are DOWN, DRAINED or reserved for jobs in higher priority
> partitions)

What does "sinfo" tell you? Is there a running slurmd?

- S


-- 
Steffen Grunewald, Cluster Administrator
Max Planck Institute for Gravitational Physics (Albert Einstein Institute)
Am Mühlenberg 1 * D-14476 Potsdam-Golm * Germany
~~~
Fon: +49-331-567 7274
Mail: steffen.grunewald(at)aei.mpg.de
~~~

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Jobs of a user are stuck in Completing stage for a long time and cannot cancel them

2024-04-10 Thread Paul Edmon via slurm-users
Usually to clear jobs like this you have to reboot the node they are on. 
That will then force the scheduler to clear them.


-Paul Edmon-

On 4/10/2024 2:56 AM, archisman.pathak--- via slurm-users wrote:

We are running a slurm cluster with version `slurm 22.05.8`. One of our users 
has reported that their jobs have been stuck at the completion stage for a long 
time. Referring to Slurm Workload Manager - Slurm Troubleshooting Guide we 
found that indeed the batchhost for the job was removed from the cluster, 
perhaps without draining it first.

How do we cancel/delete the jobs ?

* We tried scancel on the batch and individual job ids from both the user and 
from SlurmUser



--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Jobs of a user are stuck in Completing stage for a long time and cannot cancel them

2024-04-10 Thread Cutts, Tim via slurm-users
We have Weka filesystems on one of our clusters and saw this; we discovered we 
had slightly misconfigured the weka client and the result was that Weka’s and 
SLURMs cgroups were fighting with each other, and this seemed to be the result. 
 Fixing the weka cgroups config improved the problem, for us.  I haven’t heard 
anyone complain about it since.

Tim

--
Tim Cutts
Scientific Computing Platform Lead
AstraZeneca

Find out more about R&D IT Data, Analytics & AI and how we can support you by 
visiting our Service 
Catalogue |


From: Paul Edmon via slurm-users 
Date: Wednesday, 10 April 2024 at 14:46
To: slurm-users@lists.schedmd.com 
Subject: [slurm-users] Re: Jobs of a user are stuck in Completing stage for a 
long time and cannot cancel them
Usually to clear jobs like this you have to reboot the node they are on.
That will then force the scheduler to clear them.

-Paul Edmon-

On 4/10/2024 2:56 AM, archisman.pathak--- via slurm-users wrote:
> We are running a slurm cluster with version `slurm 22.05.8`. One of our users 
> has reported that their jobs have been stuck at the completion stage for a 
> long time. Referring to Slurm Workload Manager - Slurm Troubleshooting Guide 
> we found that indeed the batchhost for the job was removed from the cluster, 
> perhaps without draining it first.
>
> How do we cancel/delete the jobs ?
>
> * We tried scancel on the batch and individual job ids from both the user and 
> from SlurmUser
>

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


AstraZeneca UK Limited is a company incorporated in England and Wales with 
registered number:03674842 and its registered office at 1 Francis Crick Avenue, 
Cambridge Biomedical Campus, Cambridge, CB2 0AA.

This e-mail and its attachments are intended for the above named recipient only 
and may contain confidential and privileged information. If they have come to 
you in error, you must not copy or show them to anyone; instead, please reply 
to this e-mail, highlighting the error to the sender and then immediately 
delete the message. For information about how AstraZeneca UK Limited and its 
affiliates may process information, personal data and monitor communications, 
please see our privacy notice at 
www.astrazeneca.com

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Upgrading nodes

2024-04-10 Thread Steve Berg via slurm-users
I just finished migrating a few dozen blade servers from torque to 
slurm.  They're all running Alma 8 currently with the slurm that is 
available from epel.  I do want to get it all upgraded to running Alma 9 
and the current version of slurm.  Got one system set up as the 
slurmctld system running Alma 9.  I grabbed the tar ball and built RPMs 
for 9.x.  Got a few questions about the best path to proceed.


Can I use the Alma 9 system to build rpms for Alma 8?  I'm sure I can 
rig up an 8 system to build rpms on but thought I'd see if there was a 
way to do it on the one 9 system.


My plan will be to get the rpms built for 8 and 9, update the slurmctld 
system to the latest version of slurm, then update all the nodes to the 
current slurmd version.  Once that's done I should be able to reinstall 
individual nodes to Alma 9 and the same version of slurmd.


Am I missing anything in that sequence?  I'm fairly confident that the 
users aren't running any code that will notice the difference between a 
node running 8 or 9, that should be transparent to them.



--
//-Fixer of that which is broke-//
//-Home = sb...@mississippi.com-//
//- Sinners can repent, but stupid is forever. -//



--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Upgrading nodes

2024-04-10 Thread Brian Andrus via slurm-users
Yes. You can build the 8 rpms on 9. Look at 'mock' to do so. I did 
similar when I still had to support EL7


Fairly generic plan, the devil is in the details and verifying each 
step, but those are the basic bases you need to touch.


Brian Andrus


On 4/10/2024 1:48 PM, Steve Berg via slurm-users wrote:
I just finished migrating a few dozen blade servers from torque to 
slurm.  They're all running Alma 8 currently with the slurm that is 
available from epel.  I do want to get it all upgraded to running Alma 
9 and the current version of slurm.  Got one system set up as the 
slurmctld system running Alma 9.  I grabbed the tar ball and built 
RPMs for 9.x.  Got a few questions about the best path to proceed.


Can I use the Alma 9 system to build rpms for Alma 8?  I'm sure I can 
rig up an 8 system to build rpms on but thought I'd see if there was a 
way to do it on the one 9 system.


My plan will be to get the rpms built for 8 and 9, update the 
slurmctld system to the latest version of slurm, then update all the 
nodes to the current slurmd version.  Once that's done I should be 
able to reinstall individual nodes to Alma 9 and the same version of 
slurmd.


Am I missing anything in that sequence?  I'm fairly confident that the 
users aren't running any code that will notice the difference between 
a node running 8 or 9, that should be transparent to them.





--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Jobs of a user are stuck in Completing stage for a long time and cannot cancel them

2024-04-10 Thread archisman.pathak--- via slurm-users
Could you give more details regarding this and how you debugged the same?

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Jobs of a user are stuck in Completing stage for a long time and cannot cancel them

2024-04-10 Thread archisman.pathak--- via slurm-users
In our case, that node has been removed from the cluster and cannot be added 
back right now ( is being used for some other work ). What can we do in such a 
case?

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Jobs of a user are stuck in Completing stage for a long time and cannot cancel them

2024-04-10 Thread Christopher Samuel via slurm-users

On 4/10/24 10:41 pm, archisman.pathak--- via slurm-users wrote:


In our case, that node has been removed from the cluster and cannot be
added back right now ( is being used for some other work ). What can we
do in such a case?


Mark the node as "DOWN" in Slurm, this is what we do when we get jobs 
caught in this state (and there's nothing else on the node for our 
shared nodes).


Best of luck!
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com