We had similar issues with Slurm 23.11.1 (and 23.11.2). Jobs get stuck in
a completing state and slurmd daemons can't be killed because they are left
in a CLOSE-WAIT state. See my previous mail to the mailing list for the
details. And also https://bugs.schedmd.com/show_bug.cgi?id=18561 for
another
On 1/30/24 09:36, Fokke Dijkstra wrote:
We had similar issues with Slurm 23.11.1 (and 23.11.2). Jobs get stuck in
a completing state and slurmd daemons can't be killed because they are
left in a CLOSE-WAIT state. See my previous mail to the mailing list for
the details. And also https://bugs.sc
I built 23.02.7 and tried that and had the same problems.
BTW, I am using the slurm.spec rpm build method (built on Rocky 8 boxes
with NVIDIA 535.54.03 proprietary drives installed).
The behavior I was seeing was one would start a GPU job. It was fine at
first but at some point the slurmste
These are scary news. I just updated to 23.11.1, but couldn't confirm the
problems described so far. I'll do some more extensive and intensive tests.
In case of desaster: Does anyone knows how to rollback the DB, as some new DB
'objects' attributes are introduced in 23.11.1. I never had the chanc
This is definitely a NVML thing crashing slurmstepd. Here is what I find
doing an strace of the slurmstepd: [3681401.0] process at the point the
crash happens:
[pid 1132920] fcntl(10, F_SETFD, FD_CLOEXEC) = 0
[pid 1132920] read(10, "1132950 (bash) S 1132919 1132950"..., 511) = 339
[pid 11329
Hey folks -
The mailing list will be offline for about an hour as we upgrade the
host, upgrade the mailing list software, and change the mail
configuration around.
As part of these changes, the "From: " field will no longer be the
original sender, but instead use the mailing list ID itself.
Welcome to the updated list. Posting is re-enabled now.
- Tim
On 1/30/24 11:56, Tim Wickberg wrote:
Hey folks -
The mailing list will be offline for about an hour as we upgrade the
host, upgrade the mailing list software, and change the mail
configuration around.
As part of these changes,