Collected more data points on this issue. 1. Tried offline CPUs. We found that the crash typically was on CPU:40, so offlined CPU40 and repeated the test. The test seemed to make progress but panicked on a different CPU. I tried to offline several more CPU, but the crash seems to move on to other CPUs.
2. Changed the scheduler from CFQ to NOOP. This made no difference either, crash was seen on CPU:44 and offline CPU44 yielded the same results. Panics seem to happen either in the scheduler or in ext4 code (note that we are running stress on SDA). According to Cavium eng this could be a due to a bad L2 cache or memory. Tailing /var/log/syslog and /var/log/kernlog while the tests were running I did see messages like this: Jun 12 14:57:55 seuss ipmievd: Voltage sensor CPU_VTT_DDR02 Upper Non-critical going high Asserted (Reading 0.77 > Threshold 0.77 Volts) Jun 12 14:57:56 seuss ipmievd: Voltage sensor CPU_VTT_DDR02 Upper Non-critical going high Deasserted (Reading 0.76 > Threshold 0.77 Volts) Jun 12 14:57:57 seuss ipmievd: Voltage sensor CPU_VTT_DDR13 Upper Non-critical going high Deasserted (Reading 0.76 > Threshold 0.77 Volts) Jun 12 14:57:58 seuss ipmievd: Voltage sensor CPU_VTT_DDR13 Upper Non-critical going high Asserted (Reading 0.77 > Threshold 0.77 Volts) We have other CRB1S that function as expected and the stress-ng tests do no cause any panics. I am tempted consider this issue to be a hardware issue with this particular CRB1S ** Changed in: linux (Ubuntu Bionic) Status: Incomplete => Won't Fix ** Changed in: linux (Ubuntu Artful) Status: Confirmed => Won't Fix ** Changed in: linux (Ubuntu) Status: Incomplete => Won't Fix -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1754053 Title: oops in set_next_entity / ipmi_msghandler To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1754053/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs