o.k. thx for the explanation. Am Fr., 27. Sept. 2019 um 15:38 Uhr schrieb Steffen Grunewald < steffen.grunew...@aei.mpg.de>:
> On Fri, 2019-09-27 at 14:58:40 +0200, Rafał Kędziorski wrote: > > Am Fr., 27. Sept. 2019 um 13:50 Uhr schrieb Steffen Grunewald < > > steffen.grunew...@aei.mpg.de>: > > > On Fri, 2019-09-27 at 11:19:16 +0200, Juergen Salk wrote: > > > > > > > > you may try setting `ReturnToService=2´ in slurm.conf. > > > > > > > Caveat: A spontaneously rebooting machine may create a "black hole" > this > > > way. > > > > > How do you mean this? Could ReturnToService=2 be a problem? > > For us it was - we had (and still have) nodes spontaneously rebooting. > If they come up into idle, they will eat the next job, etc as infinitum - > thus we've set ReturnToService=0. > > "Black hole" in a figurative way, still swallowing all it could get its > hands on. > > You've got to decide what's worse: have full control over machines rebooted > intentionally, or have full control over misbehaving ones. My own choice > is clear. > > - S > >