Adrian Sevcenco writes:
> Having just implemented some triggers i just noticed this:
>
> NODELISTNODES PARTITION STATE CPUSS:C:T MEMORY TMP_DISK WEIGHT
> AVAIL_FE REASON
> alien-0-47 1alien*draining 48 48:1:1 193324 214030 1
> rack-0,4 Kill task failed
> al
On 8/7/21 11:47 pm, Adrian Sevcenco wrote:
yes, the jobs that are running have a part of file saving if they are
killed,
saving which depending of the target can get stuck ...
i have to think for a way to take a processes snapshot when this happens ..
Slurm does let you request a signal a cer