Re: [slurm-users] Slurm cannot kill a job which time limit exhausted

2019-03-19 Thread Prentice Bisbal
Slurm is trying to kill the job that is exceeding it's time limit, but the job doesn't die, so Slurm marks the node down because it sees this as a problem with the node. Increasing the value for GraceTime orĀ  KillWait might help: *GraceTime* Specifies, in units of seconds, the preemption

[slurm-users] Slurm cannot kill a job which time limit exhausted

2019-03-19 Thread Taras Shapovalov
Hey guys, When a job max time is exceeded, then Slurm tries to kill the job and fails: [2019-03-15T09:44:03.589] sched: _slurm_rpc_allocate_resources JobId=1325 NodeList=rn003 usec=355 [2019-03-15T09:44:03.928] prolog_running_decr: Configuration for JobID=1325 is complete [2019-03-15T09:45:12.739