On Sat, 2016-12-10 at 01:52 +1000, Nicholas Piggin wrote: > This does not solve the hardlockup problem completely however, > because > interrupts can often become hard disabled when soft disabled for long > periods. And they can be hard disabled for other reasons. > > To make up for the lack of a periodic true NMI, this also has an SMP > hard lockup detector where all CPUs can observe lockups on others. > > This still needs a bit more polishing, testing, comments, config > options, and boot parameters, etc., so it's RFC quality only.
Paulus and I discussed a plan with Balbir to also limit the cases of hard-disable. They typically happen as a result of an external interrupt. We could on P8 and earlier, just fetch the interrupt from the XICS in the "masked" path and stash it in the PACA. We already have a way to stash an interrupt there for later processing because KVM sometimes does it. That would cause the XICS to elevate the priority effectively masking subsequent interrupts. We'd have to change the XICS code to use the same priority for IPIs and externals too though. For XIVE (P9), we can just poke at the CPU priority register in the TM area to mask at the PIC level in that case and unmask later. Cheers, Ben.