On 18 Sep 2018, at 13.29, Simone Lazzaris <s.lazza...@interactive.eu> wrote: > > > Hi all, again; > > > > I've enabled the core dumps and let it go for some day waiting for the issue > > to reoccur. > > > > Meantime I've also upgraded the poolmon script, as Sami suggested. > > > > It seems that the upgrade has scared the issue away, because it no longer > > occurred. > > > > Maybe the problem is related to the way the old poolmon talked to the > > director daemon? I'm not very inclined to downgrade poolmon to catch a > > traceback, but can do if neccessary. > > Well, maybe it's not necessary ;) > I've performed some maintenance operations on the backends and that triggered > the crash. It seems that something goes wrong where one backend come back > online.
It's weird how easily you can reproduce the crash. I've ran all kinds of (stress) tests and I can't reproduce this crash. I was able to reproduce the original hang though. > Unfortunately, the core was not dumped.... And I don't know what to do: the > director service was not chrooted, and ulimit -c is unlimited. Do you have: sysctl -w fs.suid_dumpable=2