On 8/6/21 1:27 PM, Diego Zuccato wrote:
Hi.
Hi!
Might it be due to a timeout (maybe the killed job is creating a core file, or
caused heavy swap usage)?
i will have to search for culprit ..
the problem is why would the node be put in drain for the reason of failed
killing? and how can i control/disable
this?
Thank you!
Adrian
BYtE,
Diego
Il 06/08/2021 09:02, Adrian Sevcenco ha scritto:
Having just implemented some triggers i just noticed this:
NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK WEIGHT
AVAIL_FE REASON
alien-0-47 1 alien* draining 48 48:1:1 193324 214030 1
rack-0,4 Kill task failed
alien-0-56 1 alien* drained 48 48:1:1 193324 214030 1
rack-0,4 Kill task failed
i was wondering why a node is drained when killing of task fails and how can i
disable it? (i use cgroups)
moreover, how can the killing of task fails? (this is on slurm 19.05)
Thank you!
Adrian