On Fri, Dec 06, 2019 at 11:45:47AM +0100, Pavlos Parissis wrote:
> On ?????????, 6 ????u????? 2019 10:36:18 ?.?. CET Sander Hoentjen wrote:
> > 
> > On 12/6/19 10:20 AM, Pavlos Parissis wrote:
> > > On ?????????, 6 ????u????? 2019 9:23:24 ?.?. CET Sander Hoentjen wrote:
> > >> Hi list,
> > >>
> > >> After updating from 1.8.13 to 2.0.5 (also with 2.0.10) we are seeing
> > >> kernel panics on our production servers. I haven't been able to trigger
> > >> them on a test server, and we rollbacked haproxy to 1.8 for now.
> > >>
> > >> I am attaching a panic log, hope something useful is in there.
> > >>
> > >> Anybody an idea what might be going on here?
> > >>
> > > Have you noticed any high CPU utilization prior the panic?
> > >
> > Nothing out of the ordinary, but I have only minute data, so I don't 
> > know for sure things about seconds before crash.
> > 
> 
> Then I suggest to configure sar tool to pull/store metrics every 1 second for 
> some period in order to
> see if the panic is the result of CPU(s) spinning at 100%, either at user or 
> system level. That will provide some hints
> to haproxy developers.
> 
> Another idea is to try haproxy version 2.1.x.

With this said, a kernel panic must happen and is either the result
of a hardware issue or a kernel issue. I'm seeing that something
seems to be blocked with a signal in epoll which seems to freeze
the whole system.

Sander, are you certain your kernel is up to date ? I'm seeing an RHEL
3.10 though I don't know their numbers. Hard lockups can be caused by
many different bugs unfortunately, it's not even reasonably to look for
them in a changelog. I've checked a few hard lockups there but none
seem related. It could however also be caused by some backports
specific to that kernel. In any case if your kernel is up to date, you
should file a bug at the distro to figure why it's happening. We can
possibly help if their kernel team has some questions about what
differs between 1.8 and 2.0 (a lot...).

Cheers,
Willy

Reply via email to