On Mon, 2019-04-15 at 08:55 -0400, Laurence Oberman wrote:
> On Sun, 2019-04-14 at 23:25 -0400, TomK wrote:
> > Hey All,
> > 
> > I'm getting a kernel panic on an Gigabyte GA-890XA-UD3 motherboard
> > that 
> > I've got a QLE2464 card in as a target (FC).  The kernel has been 
> > crashing / panicking in the last 1-2 months about once a
> > week.  Before 
> > that, it was rock solid for 4-5 years.  I've upgraded to kernel
> > 4.18.19 
> > but that hasn't made much of a difference.  Since the message
> > includes 
> > qla2x00_request_irqs I thought I would try here first.
> > 
> > Tried to get more info on this but:
> > 
> > 1) Keyboard doesn't work and locks up when the panic occurs.  No USB 
> > ports work.  Tried the PS/2 port but nothing.
> > 
> > 2) Unable to capture a kdump.  Can't get to the kdump vmcore due to
> > 1).
> > 
> > The two screenshots is pretty much all I can capture.  Tried things
> > like 
> > clocksource=rtc in the kernel parms and disabling hpet1 but
> > apparently I 
> > haven't disabled it everywhere since it still shows up.
> > 
> > Wondering if anyone recognizes these messages or has any idea what
> > could 
> > be the issue here?  Even a hint would be appreciated.
> > 
> 
> Hello Tom
> I have had similar issues and reported them to Himanshu@Cavium
> I have kept all my target servers at kernel 4.5 as it been the only
> version that has always been stable.
> If your motherboard has an NMI (virtual or physical) set all of these
> in /etc/sysctl.conf
> Run sysctl -a;dracut -f and reboot
> 
> kernel.nmi_watchdog = 1
> kernel.panic_on_io_nmi = 1
> kernel.panic_on_unrecovered_nmi = 
> kernel.unknown_nmi_panic = 1
> 
> When the issue shows up press the virtual/physical NMI
> 
> This is with the assumption that generic kdump is properly setup and
> dmesg | grep crash shows memory resrved by the crashkernel and that you
> have tested kdump manually.
> 
> Other options are use a USB serial port to capture the full log if you
> cannot get kdump to work.

That approach may provide further evidence about kernel bugs but it is not
guaranteed that that approach will lead to a solution. It would help if
either or both of you could do the following on a test system:
* Check out branch qla2xxx-for-next of my kernel repo on github
  (https://github.com/bvanassche/linux/tree/qla2xxx-for-next).
* Enable lockdep and KASAN in the kernel config (CONFIG_PROVE_LOCKING and
  CONFIG_KASAN).
* Build and install that kernel.
* Run your favorite workload.

Please note that the qla2xxx-for-next branch is based on the v5.1-rc1 kernel
and hence should not be installed on any production system.

Thanks,

Bart.

Reply via email to