We experience the same issue.
SLURM 24.05.1 segfaults with squeue –json and squeue --json=v0.0.41 but works
with squeue --json=v0.0.40
From: Markus Köberl via slurm-users
Date: Wednesday, 3. July 2024 at 15:15
To: Joshua Randall
Cc: slurm-users@lists.schedmd.com
Subject: [slurm-users] Re: pr
Looks like the slurm user does not exist on the system.
Did you run the slurmctld and slurmdbd before as root ?
If you remove the two lines (User, Group), the services will start.
But is is recommended to create a dedicated slurm user for that:
https://slurm.schedmd.com/quickstart_admin.html#daemon
Maybe also post the output of scontrol show job to check the other
resources allocated for the job.
On Thu, Jan 18, 2024, 19:22 Kherfani, Hafedh (Professional Services, TC) <
hafedh.kherf...@hpe.com> wrote:
> Hi Ümit, Troy,
>
>
>
> I removed the line “#SBATCH --gres=gpu:1”, and changed the sba
This line also has tobe changed:
#SBATCH --gpus-per-node=4 • #SBATCH --gpus-per-node=1
--gpus-per-node seems to be the new parameter that is replacing the --gres=
one, so you can remove the –gres line completely.
Best
Ümit
From: slurm-users on behalf of
Kherfani, Hafedh (Professional Servi
sed but
> this is not in the version we have deployed.
>
> Cheers,
>
> Laurence
> On 24.03.23 16:51, Ümit Seren wrote:
>
> Looks like you are missing the username field in the JWT token:
> https://github.com/SchedMD/slurm/blob/slurm-22-05-8-1/src/plugins/auth/jwt/aut
oes contain
> this parameter. I will continue to debug but any suggestions would be
> greatly appreciated.
>
> Cheers,
>
> Laurence
> On 23.03.23 11:42, Ümit Seren wrote:
>
> If you use AzureAD as your identity provider beware that their JWKS json
> doesn't contai
If you use AzureAD as your identity provider beware that their JWKS json
doesn't contain the alg parameter.
We opened an issue: https://bugs.schedmd.com/show_bug.cgi?id=16168 and it
is confirmed.
As a workaround you can use this jq query to add the alg to the jwks json
that you get from AzureAD:
cu
As a side note:
In Slurm 23.x a new rate limiting feature for client RPC calls was added:
(see this commit:
https://github.com/SchedMD/slurm/commit/674f118140e171d10c2501444a0040e1492f4eab#diff-b4e84d09d9b1d817a964fb78baba0a2ea6316bfc10c1405329a95ad0353ca33e
)
This would give operators the ability
We had the same issue when we switched to job_container plugin. We ended up
running cvmfs_cpnfig probe as part of the health check tool so that the
cvmfs repos stay mounted. However after we switched on power saving we ran
into some race conditions (job landed on a node before the cvmfs was
mounted
We use power saving with our GPU nodes and they power up fine. They take a bit
longer to boot but that’s it.
What do you mean with not waking up ?
The power on script is not called ?
Best
Ümit
From: slurm-users on behalf of Loris
Bennett
Date: Thursday, 13. October 2022 at 08:14
To: Slurm User
On Fri, Sep 16, 2022 at 3:43 PM Sebastian Potthoff <
s.potth...@uni-muenster.de> wrote:
> Hi Hermann,
>
> So you both are happily(?) ignoring this warning the "Prolog and Epilog
> Guide",
> right? :-)
>
> "Prolog and Epilog scripts [...] should not call Slurm commands (e.g.
> squeue,
> scontrol, s
We did a couple of major and minor SLURM upgrades without draining the compute
nodes.
Once slurmdbd and slurmctld were updated to the new major version, we did a
package update on the compute nodes and restarted slurmd on them.
The existing running jobs continued to run fine and new jobs on the s
12 matches
Mail list logo