On Fri, Dec 06, 2019 at 11:45:47AM +0100, Pavlos Parissis wrote: > On ?????????, 6 ????u????? 2019 10:36:18 ?.?. CET Sander Hoentjen wrote: > > > > On 12/6/19 10:20 AM, Pavlos Parissis wrote: > > > On ?????????, 6 ????u????? 2019 9:23:24 ?.?. CET Sander Hoentjen wrote: > > >> Hi list, > > >> > > >> After updating from 1.8.13 to 2.0.5 (also with 2.0.10) we are seeing > > >> kernel panics on our production servers. I haven't been able to trigger > > >> them on a test server, and we rollbacked haproxy to 1.8 for now. > > >> > > >> I am attaching a panic log, hope something useful is in there. > > >> > > >> Anybody an idea what might be going on here? > > >> > > > Have you noticed any high CPU utilization prior the panic? > > > > > Nothing out of the ordinary, but I have only minute data, so I don't > > know for sure things about seconds before crash. > > > > Then I suggest to configure sar tool to pull/store metrics every 1 second for > some period in order to > see if the panic is the result of CPU(s) spinning at 100%, either at user or > system level. That will provide some hints > to haproxy developers. > > Another idea is to try haproxy version 2.1.x.
With this said, a kernel panic must happen and is either the result of a hardware issue or a kernel issue. I'm seeing that something seems to be blocked with a signal in epoll which seems to freeze the whole system. Sander, are you certain your kernel is up to date ? I'm seeing an RHEL 3.10 though I don't know their numbers. Hard lockups can be caused by many different bugs unfortunately, it's not even reasonably to look for them in a changelog. I've checked a few hard lockups there but none seem related. It could however also be caused by some backports specific to that kernel. In any case if your kernel is up to date, you should file a bug at the distro to figure why it's happening. We can possibly help if their kernel team has some questions about what differs between 1.8 and 2.0 (a lot...). Cheers, Willy

