[slurm-users] Re: How to reinstall / reconfigure Slurm?

2024-04-08 Thread Shooktija S N via slurm-users
Follow up:
I was able to fix my problem following advice in this post

which
said that the GPU GRES could be manually configured (no autodetect) by
adding a line like this: 'NodeName=slurmnode Name=gpu File=/dev/nvidia0' to
gres.conf

On Wed, Apr 3, 2024 at 4:30 PM Shooktija S N  wrote:

> Hi,
>
> I am setting up Slurm on our lab's 3 node cluster and I have run into a
> problem while adding GPUs (each node has an NVIDIA 4070 ti) as a GRES.
> There is an error at the 'debug' log level in slurmd.log that says that the
> GPU is file-less and is being removed from the final GRES list. This error
> according to some older posts on this forum might be fixed by reinstalling
> / reconfiguring Slurm with the right flag (the '--with-nvml' flag according
> to this  post).
>
> Line in /var/log/slurmd.log:
> [2024-04-03T15:42:02.695] debug:  Removing file-less GPU gpu:rtx4070 from
> final GRES list
>
> Does this error require me to either reinstall / reconfigure Slurm? What
> does 'reconfigure Slurm' mean?
> I'm about as clueless as a caveman with a smartphone when it comes to
> Slurm administration and Linux system administration in general. So, if you
> could, please explain it to me as simply as possible.
>
> slurm.conf without comment lines:
> ClusterName=DlabCluster
> SlurmctldHost=server1
> GresTypes=gpu
> ProctrackType=proctrack/linuxproc
> ReturnToService=1
> SlurmctldPidFile=/var/run/slurmctld.pid
> SlurmctldPort=6817
> SlurmdPidFile=/var/run/slurmd.pid
> SlurmdPort=6818
> SlurmdSpoolDir=/var/spool/slurmd
> SlurmUser=root
> StateSaveLocation=/var/spool/slurmctld
> TaskPlugin=task/affinity,task/cgroup
> InactiveLimit=0
> KillWait=30
> MinJobAge=300
> SlurmctldTimeout=120
> SlurmdTimeout=300
> Waittime=0
> SchedulerType=sched/backfill
> SelectType=select/cons_tres
> JobCompType=jobcomp/none
> JobAcctGatherFrequency=30
> SlurmctldDebug=debug2
> SlurmctldLogFile=/var/log/slurmctld.log
> SlurmdDebug=debug2
> SlurmdLogFile=/var/log/slurmd.log
> NodeName=server[1-3] RealMemory=128636 Sockets=1 CoresPerSocket=64
> ThreadsPerCore=2 State=UNKNOWN Gres=gpu:rtx4070:1
> PartitionName=mainPartition Nodes=ALL Default=YES MaxTime=INFINITE State=UP
>
> gres.conf (only one line):
> AutoDetect=nvml
>
> While installing cuda, I know that nvml has been installed because of this
> line in /var/log/cuda-installer.log:
> [INFO]: Installing: cuda-nvml-dev
>
> Thanks!
>
> PS: I could've added this as a continuation to this post
> , but for some
> reason I do not have permission to post to that group, so here I am
> starting a new thread.
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Elastic Computing: Is it possible to incentivize grouping power_up calls?

2024-04-08 Thread Xaver Stiensmeier via slurm-users

Dear slurm user list,

we make use of elastic cloud computing i.e. node instances are created
on demand and are destroyed when they are not used for a certain amount
of time. Created instances are set up via Ansible. If more than one
instance is requested at the exact same time, Slurm will pass those into
the resume script together and one Ansible call will handle all those
instances.

However, more often than not workflows will request multiple instances
within the same second, but not at the exact same time. This leads to
multiple resume script calls and therefore to multiple Ansible calls.
This will lead to less clear log files, greater CPU consumption by the
multiple running Ansible calls and so on.

What I am looking for is an option to force Slurm to wait a certain
amount and then perform a single resume call for all instances within
that time frame (let's say 1 second).

Is this somehow possible?

Best regards,
Xaver


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Slurm User Group 2024 Call for Papers

2024-04-08 Thread Victoria Hobson via slurm-users
Slurm User Group (SLUG) 2024 is set for September 12-13 at the
University of Oslo in Oslo, Norway.

Registration information and a high-level schedule can be found here:
https://slug24.splashthat.com/

We invite all interested attendees to submit a presentation abstract
to be given at SLUG. Presentation content can be in the form of a
tutorial, technical presentation or site report.

SLUG 2024 is sponsored and organized by the University of Oslo and
SchedMD. This international event is open to those who want to:

Learn more about Slurm, a highly scalable resource manager and job scheduler
- Share their knowledge and experience with other users and administrators
- Get detailed information about the latest features and developments
- Share requirements and discuss future developments

Everyone who wants to present their own usage, developments, site
report, or tutorial about Slurm is invited to submit abstract details
here: https://forms.gle/N7bFo5EzwuTuKkBN7

Abstracts are due Friday, May 31st and notifications of acceptance
will go out by Friday, June 14th.

--
Victoria Hobson
SchedMD LLC
Vice President of Marketing

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Elastic Computing: Is it possible to incentivize grouping power_up calls?

2024-04-08 Thread Brian Andrus via slurm-users

Xaver,

You may want to look at the ResumeRate option in slurm.conf:

   ResumeRate
   The rate at which nodes in power save mode are returned to normal
   operation by ResumeProgram. The value is a number of nodes per
   minute and it can be used to prevent power surges if a large number
   of nodes in power save mode are assigned work at the same time (e.g.
   a large job starts). A value of zero results in no limits being
   imposed. The default value is 300 nodes per minute.

I have all our nodes in the cloud and they power down/deallocate when 
idle for a bit. I do not use ansible to start them and use the cli 
interface directly, so the only cpu usage is by that command. I do plan 
on having ansible run from the node to do any hot-fix/updates from the 
base image or changes. By running it from the node, it would alleviate 
any cpu spikes on the slurm head node.


Just a possible path to look at.

Brian Andrus

On 4/8/2024 6:10 AM, Xaver Stiensmeier via slurm-users wrote:

Dear slurm user list,

we make use of elastic cloud computing i.e. node instances are created
on demand and are destroyed when they are not used for a certain amount
of time. Created instances are set up via Ansible. If more than one
instance is requested at the exact same time, Slurm will pass those into
the resume script together and one Ansible call will handle all those
instances.

However, more often than not workflows will request multiple instances
within the same second, but not at the exact same time. This leads to
multiple resume script calls and therefore to multiple Ansible calls.
This will lead to less clear log files, greater CPU consumption by the
multiple running Ansible calls and so on.

What I am looking for is an option to force Slurm to wait a certain
amount and then perform a single resume call for all instances within
that time frame (let's say 1 second).

Is this somehow possible?

Best regards,
Xaver


-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Avoiding fragmentation

2024-04-08 Thread Gerhard Strangar via slurm-users
Hi,

I'm trying to figure out how to deal with a mix of few- and many-cpu
jobs. By that I mean most jobs use 128 cpus, but sometimes there are
jobs with only 16. As soon as that job with only 16 is running, the
scheduler splits the next 128 cpu jobs into 96+16 each, instead of
assigning a full 128 cpu node to them. Is there a way for the
administrator to achieve preferring full nodes?
The existence of pack_serial_at_end makes me believe there is not,
because that basically is what I needed, apart from my serial jobs using
16 cpus instead of 1.

Gerhard

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Avoiding fragmentation

2024-04-08 Thread Loris Bennett via slurm-users
Hi Gerhard,

Gerhard Strangar via slurm-users  writes:

> Hi,
>
> I'm trying to figure out how to deal with a mix of few- and many-cpu
> jobs. By that I mean most jobs use 128 cpus, but sometimes there are
> jobs with only 16. As soon as that job with only 16 is running, the
> scheduler splits the next 128 cpu jobs into 96+16 each, instead of
> assigning a full 128 cpu node to them. Is there a way for the
> administrator to achieve preferring full nodes?
> The existence of pack_serial_at_end makes me believe there is not,
> because that basically is what I needed, apart from my serial jobs using
> 16 cpus instead of 1.
>
> Gerhard

This may well not be relevant for your case, but we actively discourage
the use of full nodes for the following reasons:

  - When the cluster is full, which is most of the time, MPI jobs in
general will start much faster if they don't specify the number of
nodes and certainly don't request full nodes.  The overhead due to
the jobs being scattered across nodes is often much lower than the
additional waiting time incurred by requesting whole nodes.

  - When all the cores of a node are requested, all the memory of the
node becomes unavailable to other jobs, regardless of how much
memory is requested or indeed how much is actually used.  This holds
up jobs with low CPU but high memory requirements and thus reduces
the total throughput of the system.

These factors are important for us because we have a large number of
single core jobs and almost all the users, whether doing MPI or not,
significantly overestimate the memory requirements of their jobs.

Cheers,

Loris

-- 
Dr. Loris Bennett (Herr/Mr)
FUB-IT (ex-ZEDAT), Freie Universität Berlin

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com