We've been using Singularity for this for years with out much issue. It
doesn't cover all use cases, but most applications work fine.
We have not implemented this yet:
https://slurm.schedmd.com/containers.html But I intend to investigate
it in the future. As of right now we just have the late
We are getting a few calls to support container workloads on our Slurm cluster;
I want to support these user's usecases, so am looking into it now.
The problem for me is, I'm not super-familiar with container runtimes excepting
(regular rootful) Docker... I see that any Slurm-compatible runtime
Thanks Ole,
this is very helpful. I was unaware of that issue. From the bug report it's
not clear to me if it was just a sreport (display) issue, or if the problem
was in the way the data was stored.
In fact I am running 23.11.5 which I installed in April. The numbers I see
for the last few months
Hi all,
I have a cloud cluster running in GCP that seems to have gotten stuck
in a state where the slurmctld will not start/stop compute nodes, it
just sits there with thousands of jobs in the queue and only a few
compute nodes up and running (out of thousands).
I can try to kick it by setting no
Hi Davide,
On 8/22/24 21:30, Davide DelVento via slurm-users wrote:
I am confused by the reported amount of Down and PLND Down by sreport.
According to it, our cluster would have had a significant amount of
downtime, which I know didn't happen (or, according to the documentation
"time that slu
https://github.com/SchedMD/slurm/blob/ffae59d9df69aa42a090044b867be660be259620/src/plugins/openapi/v0.0.38/jobs.c#L136
but no longer in
https://github.com/SchedMD/slurm/blob/slurm-23.02/src/plugins/openapi/v0.0.39/jobs.c
Which underwent major revision
In the next openapi version
On 22/0