Hi Rob,
"Groner, Rob" writes:
> I'm trying to setup a specific partition where users can fight with the OS
> for dominance, The oversubscribe property sounds like what I want, as it says
> "More than one job can execute simultaneously on the same compute resource."
> That's exactly what I wa
On 1/19/23 5:01 am, Stefan Staeglich wrote:
Hi,
Hiya,
I'm wondering where the UnkillableStepProgram is actually executed. According
to Mike it has to be available on every on the compute nodes. This makes sense
only if it is executed there.
That's right, it's only executed on compute nodes
Hi all.
In my site, I configure the cpu and gpu resources as TRES
(https://slurm.schedmd.com/tres.html). Multiple jobs can co-run on the same
node.
The users want to know how many cores remain unallocated when they are
submitting jobs. This can help them choose which partition to use.
So is the
Just to hopefully close this out, I believe I was actually able to resolve this
in “user-land” rather than mucking with the database.
I was able to requeue the bad jid’s, and they went pending.
Then I updated the jobs to a time limit of 60.
Then I scancelled the jobs, and they returned to a cance
I'm trying to setup a specific partition where users can fight with the OS for
dominance, The oversubscribe property sounds like what I want, as it says
"More than one job can execute simultaneously on the same compute resource."
That's exactly what I want. I've setup a node with 48 CPU and o
Hi,
I'm wondering where the UnkillableStepProgram is actually executed. According
to Mike it has to be available on every on the compute nodes. This makes sense
only if it is executed there.
But the man page slurm.conf of 21.08.x states:
UnkillableStepProgram
Must be execut
Helle Björn-Helge.
Thank for reminding me /sys/fs for checking OOM issues. I lost that already out
of sight again.
In this case, there are more steps involved (one for each srun call). I'm not
sure whether cgroup handles each separately, or just on a node-base. If the
latter ... why do I have