Hi, We have slurm version 18.08.6 One of my nodes is in drain state Reason=Kill task failed [root@2020-06-27T02:25:29]
In the node I can see in the slurmd.log 2020-06-27T01:24:26.242] task_p_slurmd_batch_request: 963771 [2020-06-27T01:24:26.242] task/affinity: job 963771 CPU input mask for node: 0x0FFFFFFFFF [2020-06-27T01:24:26.242] task/affinity: job 963771 CPU final HW mask for node: 0x55FFFFFFFF [2020-06-27T01:24:26.247] _run_prolog: run job script took usec=4537 [2020-06-27T01:24:26.247] _run_prolog: prolog with lock for job 963771 ran for 0 seconds [2020-06-27T01:24:26.247] Launching batch job 963771 for UID 5200 [2020-06-27T01:24:26.276] [963771.batch] task/cgroup: /slurm/uid_5200/job_963771: alloc=147456MB mem.limit=147456MB memsw.limit=147456MB [2020-06-27T01:24:26.284] [963771.batch] task/cgroup: /slurm/uid_5200/job_963771/step_batch: alloc=147456MB mem.limit=147456MB memsw.limit=147456MB [2020-06-27T01:24:26.310] [963771.batch] task_p_pre_launch: Using sched_affinity for tasks [2020-06-27T02:24:26.933] [963771.batch] error: *** JOB 963771 ON node0802 CANCELLED AT 2020-06-27T02:24:26 DUE TO TIME LIMIT *** [2020-06-27T02:25:27.009] [963771.batch] error: *** JOB 963771 STEPD TERMINATED ON node0802 AT 2020-06-27T02:25:27 DUE TO JOB NOT ENDING WITH SIGNALS *** [2020-06-27T02:25:27.009] [963771.batch] sending REQUEST_COMPLETE_BATCH_SCRIPT, error:4001 status 15 [2020-06-27T02:25:27.011] [963771.batch] done with job If I try to get information about this job nothing get sacct -j 963771 JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- Why I don`t get information about this job??? Thanks in advance Angelines ________________________________________________ Angelines Alberto Morillas Unidad de Arquitectura Informática Despacho: 22.1.32 Telf.: +34 91 346 6119 Fax: +34 91 346 6537 skype: angelines.alberto CIEMAT Avenida Complutense, 40 28040 MADRID ________________________________________________