After upgrading to version 23.11.3 we started to get slammed with the following
log messages from slurmctld
"error: validate_group: Could not find group with gid "
This spans a handful of groups and repeats constantly, drowning out just about
everything else. Attempting to do a lookup on the gr
Slurm version 23.02.07
If I have a QoS defined that has a set number of say GPU devices set in the
GrpTRES. Is there an easy way to generate a list of how much of the
defined quota is allocated or conversely un-allocated?
e.g.:
Name|Priority|GraceTime|Preempt|PreemptExemptTime|PreemptMode|Flags|
After using just Fairshare for over a year on our GPU cluster, we
have decided it is not working for us for what we really want
to achieve among our groups. We have decided to look at preemption.
What we want is for users to NOT have a #job/GPU maximum (if they are
only person on the cluster t
Most of my ideas have revolved around creating file systems on-the-fly as part
of the job prolog and destroying them in the epilog. The issue with that
mechanism is that formatting a file system (e.g. mkfs.) can be
time-consuming. E.g. formatting your local scratch SSD as an LVM PV+VG and
all
Hi Magnus,
I understand. Thanks a lot for your suggestion.
Best,
Tim
On 06.02.24 15:34, Hagdorn, Magnus Karl Moritz wrote:
Hi Tim,
in the end the InitScript didn't contain anything useful because
slurmd: error: _parse_next_key: Parsing error at unrecognized key:
InitScript
At this stage I g
Hi Tim,
in the end the InitScript didn't contain anything useful because
slurmd: error: _parse_next_key: Parsing error at unrecognized key:
InitScript
At this stage I gave up. This was with SLURM 23.02. My plan was to
setup the local scratch directory with XFS and then get the script to
apply a
Hi Magnus,
thanks for your reply! If you can, would you mind sharing the InitScript
of your attempt at getting it to work?
Best,
Tim
On 06.02.24 15:19, Hagdorn, Magnus Karl Moritz wrote:
Hi Tim,
we are using the container/tmpfs plugin to map /tmp to a local NVMe
drive which works great. I d
Hi Tim,
we are using the container/tmpfs plugin to map /tmp to a local NVMe
drive which works great. I did consider setting up directory quotas. I
thought the InitScript [1] option should do the trick. Alas, I didn't
get it to work. If I remember correctly, slurm complained about the
option being p
Hi,
In our SLURM cluster, we are using the job_container/tmpfs plugin to
ensure that each user can use /tmp and it gets cleaned up after them.
Currently, we are mapping /tmp into the nodes RAM, which means that the
cgroups make sure that users can only use a certain amount of storage
inside /
Hi Ajad,
Amjad Syed via slurm-users writes:
> Hello
>
> I have the following scenario:
> I need to submit a sequence of up to 400 jobs where the even jobs depend on
> the preceeding odd job to finish and every odd job depends on the presence of
> a
> file generated by the preceding even job (a
Amjad Syed via slurm-users writes:
> I need to submit a sequence of up to 400 jobs where the even jobs depend on
> the preceeding odd job to finish and every odd job depends on the presence
> of a file generated by the preceding even job (availability of the file for
> the first of those 400 jobs
Hello
I have the following scenario:
I need to submit a sequence of up to 400 jobs where the even jobs depend on
the preceeding odd job to finish and every odd job depends on the presence
of a file generated by the preceding even job (availability of the file for
the first of those 400 jobs is gua
12 matches
Mail list logo