Re: [slurm-users] draining nodes due to failed killing of task?

2021-08-08 Thread Bjørn-Helge Mevik
Adrian Sevcenco writes: > Having just implemented some triggers i just noticed this: > > NODELISTNODES PARTITION STATE CPUSS:C:T MEMORY TMP_DISK WEIGHT > AVAIL_FE REASON > alien-0-47 1alien*draining 48 48:1:1 193324 214030 1 > rack-0,4 Kill task failed > al

Re: [slurm-users] draining nodes due to failed killing of task?

2021-08-08 Thread Christopher Samuel
On 8/7/21 11:47 pm, Adrian Sevcenco wrote: yes, the jobs that are running have a part of file saving if they are killed, saving which depending of the target can get stuck ... i have to think for a way to take a processes snapshot when this happens .. Slurm does let you request a signal a cer