Hello, how do you implement something like "drain host after 10 consecutive failed jobs"? Unlike a host check script, that checks for known errors, I'd like to stop killing jobs just because one node is faulty.
Gerhard
Hello, how do you implement something like "drain host after 10 consecutive failed jobs"? Unlike a host check script, that checks for known errors, I'd like to stop killing jobs just because one node is faulty.
Gerhard