Hello,
It also appears that random jobs are being identified as using too much memory,
despite being well within limits.
For example, a job is running that requested 2048 MB per CPU and all processes
are within the limit. But, the job is identified as being over limit when it
isn't. Please
Hello,
Due to the recent CVE posted by Tim, we did upgrade from SLURM 20.11.3 to
20.11.9.
Today, I received a ticket from a user with their output files populated with the
"slurmstepd: error: Exceeded job memory limit" message. But, the jobs are
still running and it seems that the controller
Ghui,
It seems that things are doing what they should.
You are allowing an account to become root inside the pod and the pod is
considered a trusted environment by slurm (you are running munge inside it).
So as far as slurm is concerned, 'root' from a trusted environment is
submitting a job.
Hi Hermann,
You're welcome, looking forward to hearing some feedback from you.
Regarding the matrix integration, or any other for that matter, gosl code
was written with extensibility in mind.
Meaning, all the helper code required to create a new connector is packaged
and easily reusable.
If you
> I had config the right slurm and munge inside the container.
this is the reason.
Who has access to munge.key can effectively became root at slurm cluster.
you should not disclose munge.key to containers.
cheers
josef
On 18. 05. 22 9:13, GHui wrote:
...I had config the right slurm and mung
Hi,
On 18.05.22 08:25, Stephan Roth wrote:
Personal note: I'm not sure what I'd choose as a successor to
Singularity 3.8, yet. Thoughts are welcome.
I can recommend nvidia enroot/pyxis.
enroot does unprivileged sandboxes/containers, pyxis is the slurm SPANK
glue.
https://slurm.schedmd.com/