Re: [slurm-users] Source of SIGTERM

2019-03-08 Thread Marcus Wagner
Hi Doug, you could try to use auditd to catch the source. When we used LSF in earlier times, we had an issue with one of our prolog scripts, which killed jobs, when a job of the same user was already on the node. auditd helped at that point to identify our own nodecleaner script ;) Best Marc

[slurm-users] Source of SIGTERM

2019-03-07 Thread Doug Meyer
Looking for advice on identifying source of a job cancellation. Preemption is not configured on the partition. Sometimes receive a message " Job nnn on nodexxx CANCELLED at date/time Signal SIGTERM caugjt..." Do not see anyrhing in node logs or slurmctl logs suggesting the source of the