On Tuesday, 28 May 2019 9:03:16 AM PDT Matthew BETTINGER wrote: > We use triggers for the obvious alerts but is that a way to make a trigger > for nodes stuck in CG (completing) state? Some user jobs, mostly Julia > notebook can get hung in completing state is the user kills the running job > or cancels it with cntrl. When this happens we can have many many nodes > stuck in CG. Slurm 17.02.6. Thanks!
Are you using cgroups to control/constrain jobs? 17.02 is very old, now 19.05 is out only it and 18.08 are getting updates. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA