Sorry for another post. I did a bisect and found what is the bad commit for me:
044897ef4a22af89aecb8df509477beba0a2e0ce is the first bad commit commit 044897ef4a22af89aecb8df509477beba0a2e0ce Author: Richard Purdie <richard.pur...@linuxfoundation.org> Date: Mon Dec 4 22:25:43 2017 +0000 target/ppc: Fix system lockups caused by interrupt_request state corruption Occasionally in Linux guests on x86_64 we're seeing logs like: ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending 00000100req 00000004 when they should read: ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending 00000100req 00000002 The "00000004" is CPU_INTERRUPT_EXITTB yet the code calls cpu_interrupt(cs, CPU_INTERRUPT_HARD) ("00000002") in this function just before the log message. Something is causing the HARD bit setting to get lost. The knock on effect of losing that bit is the decrementer timer interrupts don't get delivered which causes the guest to sit idle in its idle handler and 'hang'. The issue occurs due to races from code which sets CPU_INTERRUPT_EXITTB. Rather than poking directly into cs->interrupt_request, that code needs to: a) hold BQL b) use the cpu_interrupt() helper This patch fixes the call sites to do this, fixing the hang. The calls are made from a variety of contexts so a helper function is added to handle the necessary locking. This can likely be improved and optimised in the future but it ensures the code is correct and doesn't lockup as it stands today. Signed-off-by: Richard Purdie <richard.pur...@linuxfoundation.org> Signed-off-by: David Gibson <da...@gibson.dropbear.id.au> :040000 040000 0d422b95967a873ad8c9c549322bbb441ce8358a 56ccf308d04930cf10745c7f65be1237dbb3554f M target