We're noticing a reproducible system boot hang on certain
post-Skylake platforms where the BIOS is configured in
legacy boot mode with x2APIC disabled. The system stalls
immediately after writing the first SMP initialization
sequence into APIC ICR.

The cause of the problem is watchdog NMI handler execution -
somewhere near the end of NMI handling (after it's already
rescheduled the next NMI) it tries to access IO port 0x61
to get the actual NMI reason on CPU0. Unfortunately, this
port is emulated by BIOS using SMIs and this emulation
apparently might take more than we expect under certain
conditions. As the result, the system is constantly moving
between NMI and SMI handler and not making any progress.

Just lower the initial frequency for now as we lower it later
even more anyway.

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
---
 xen/arch/x86/nmi.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/nmi.c b/xen/arch/x86/nmi.c
index d7fce28..1eb2a32 100644
--- a/xen/arch/x86/nmi.c
+++ b/xen/arch/x86/nmi.c
@@ -34,7 +34,8 @@
 #include <asm/apic.h>
 
 unsigned int nmi_watchdog = NMI_NONE;
-static unsigned int nmi_hz = HZ;
+/* initial watchdog frequency - shouldn't be too high to avoid boot hangs */
+static unsigned int nmi_hz = HZ / 10;
 static unsigned int nmi_perfctr_msr;   /* the MSR to reset in NMI handler */
 static unsigned int nmi_p4_cccr_val;
 static DEFINE_PER_CPU(struct timer, nmi_timer);
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to