On Mon, 2019-04-15 at 08:55 -0400, Laurence Oberman wrote: > On Sun, 2019-04-14 at 23:25 -0400, TomK wrote: > > Hey All, > > > > I'm getting a kernel panic on an Gigabyte GA-890XA-UD3 motherboard > > that > > I've got a QLE2464 card in as a target (FC). The kernel has been > > crashing / panicking in the last 1-2 months about once a > > week. Before > > that, it was rock solid for 4-5 years. I've upgraded to kernel > > 4.18.19 > > but that hasn't made much of a difference. Since the message > > includes > > qla2x00_request_irqs I thought I would try here first. > > > > Tried to get more info on this but: > > > > 1) Keyboard doesn't work and locks up when the panic occurs. No USB > > ports work. Tried the PS/2 port but nothing. > > > > 2) Unable to capture a kdump. Can't get to the kdump vmcore due to > > 1). > > > > The two screenshots is pretty much all I can capture. Tried things > > like > > clocksource=rtc in the kernel parms and disabling hpet1 but > > apparently I > > haven't disabled it everywhere since it still shows up. > > > > Wondering if anyone recognizes these messages or has any idea what > > could > > be the issue here? Even a hint would be appreciated. > > > > Hello Tom > I have had similar issues and reported them to Himanshu@Cavium > I have kept all my target servers at kernel 4.5 as it been the only > version that has always been stable. > If your motherboard has an NMI (virtual or physical) set all of these > in /etc/sysctl.conf > Run sysctl -a;dracut -f and reboot > > kernel.nmi_watchdog = 1 > kernel.panic_on_io_nmi = 1 > kernel.panic_on_unrecovered_nmi = > kernel.unknown_nmi_panic = 1 > > When the issue shows up press the virtual/physical NMI > > This is with the assumption that generic kdump is properly setup and > dmesg | grep crash shows memory resrved by the crashkernel and that you > have tested kdump manually. > > Other options are use a USB serial port to capture the full log if you > cannot get kdump to work.
That approach may provide further evidence about kernel bugs but it is not guaranteed that that approach will lead to a solution. It would help if either or both of you could do the following on a test system: * Check out branch qla2xxx-for-next of my kernel repo on github (https://github.com/bvanassche/linux/tree/qla2xxx-for-next). * Enable lockdep and KASAN in the kernel config (CONFIG_PROVE_LOCKING and CONFIG_KASAN). * Build and install that kernel. * Run your favorite workload. Please note that the qla2xxx-for-next branch is based on the v5.1-rc1 kernel and hence should not be installed on any production system. Thanks, Bart.