[slurm-users] Re: How to select a container runtime system?

2024-08-23 Thread Paul Edmon via slurm-users
We've been using Singularity for this for years with out much issue. It doesn't cover all use cases, but most applications work fine. We have not implemented this yet: https://slurm.schedmd.com/containers.html  But I intend to investigate it in the future. As of right now we just have the late

[slurm-users] How to select a container runtime system?

2024-08-23 Thread wdennis--- via slurm-users
We are getting a few calls to support container workloads on our Slurm cluster; I want to support these user's usecases, so am looking into it now. The problem for me is, I'm not super-familiar with container runtimes excepting (regular rootful) Docker... I see that any Slurm-compatible runtime

[slurm-users] Re: Slurmdbd purge and reported downtime

2024-08-23 Thread Davide DelVento via slurm-users
Thanks Ole, this is very helpful. I was unaware of that issue. From the bug report it's not clear to me if it was just a sreport (display) issue, or if the problem was in the way the data was stored. In fact I am running 23.11.5 which I installed in April. The numbers I see for the last few months

[slurm-users] trying to figure out how to troubleshoot cloud node resume/suspend

2024-08-23 Thread Alex Chekholko via slurm-users
Hi all, I have a cloud cluster running in GCP that seems to have gotten stuck in a state where the slurmctld will not start/stop compute nodes, it just sits there with thousands of jobs in the queue and only a few compute nodes up and running (out of thousands). I can try to kick it by setting no

[slurm-users] Re: Slurmdbd purge and reported downtime

2024-08-23 Thread Ole Holm Nielsen via slurm-users
Hi Davide, On 8/22/24 21:30, Davide DelVento via slurm-users wrote: I am confused by the reported amount of Down and PLND Down by sreport. According to it, our cluster would have had a significant amount of downtime, which I know didn't happen (or, according to the documentation "time that slu

[slurm-users] Re: REST API - get_user_environment

2024-08-23 Thread Daniel Letai via slurm-users
https://github.com/SchedMD/slurm/blob/ffae59d9df69aa42a090044b867be660be259620/src/plugins/openapi/v0.0.38/jobs.c#L136 but no longer in https://github.com/SchedMD/slurm/blob/slurm-23.02/src/plugins/openapi/v0.0.39/jobs.c Which underwent major revision In the next openapi version On 22/0