On 02/11/18 at 09:08pm, Eric W. Biederman wrote:
> Baoquan He <b...@redhat.com> writes:
> 
> > This is a regression fix.
> >
> > Before, to fix erratum AVR31, commit 522e66464467 ("x86/apic: Disable
> > I/O APIC before shutdown of the local APIC") moved lapic_shutdown()
> > calling after disable_IO_APIC(). This introdued a regression. The
> > root cause is that disable_IO_APIC() not only clears IO_APIC, also
> > restore boot irq mode by setting LAPIC/APIC/IMCR, lapic_shutdown()
> > after disable_IO_APIC() will disable LAPIC and ruin the possible
> > virtual wire mode setting which the code has been trying to do all
> > along.
> >
> > The consequence is, in KVM guest kernel always prints warning as below
> > during kexec/kdump kernel boots up. That happened in setup_local_APIC()
> > since 'do { xxx } while (queued && max_loops > 0)' loop does not function
> > well any more if pending irq exists in APIC IRR after LAPIC is disabled.
> > And people even saw casual kdump kernel hang once in ~30 attempts during
> > stress testing of kdump on KVM machine.
> >
> > [    0.001000] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:1467 
> > setup_local_APIC+0x228/0x330
> > [    0.001000] Modules linked in:
> > [    0.001000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-rc5+ #3
> > [    0.001000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > 1.10.2-1.fc26 04/01/2014
> > [    0.001000] RIP: 0010:setup_local_APIC+0x228/0x330
> > [    0.001000] RSP: 0000:ffffffffb6e03eb8 EFLAGS: 00010286
> > [    0.001000] RAX: 0000009edb4c4d84 RBX: 0000000000000000 RCX: 
> > 00000000b099d800
> > [    0.001000] RDX: 0000009e00000000 RSI: 0000000000000000 RDI: 
> > 0000000000000810
> > [    0.001000] RBP: 0000000000000000 R08: ffffffffffffffff R09: 
> > 0000000000000001
> > [    0.001000] R10: ffff98ce6a801c00 R11: 0761076d072f0776 R12: 
> > 0000000000000001
> > [    0.001000] R13: 00000000000000f0 R14: 0000000000004000 R15: 
> > ffffffffffffc6ff
> > [    0.001000] FS:  0000000000000000(0000) GS:ffff98ce6bc00000(0000) 
> > knlGS:0000000000000000
> > [    0.001000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    0.001000] CR2: 00000000ffffffff CR3: 0000000022209000 CR4: 
> > 00000000000406b0
> > [    0.001000] Call Trace:
> > [    0.001000]  apic_bsp_setup+0x56/0x74
> > [    0.001000]  x86_late_time_init+0x11/0x16
> > [    0.001000]  start_kernel+0x3c9/0x486
> > [    0.001000]  secondary_startup_64+0xa5/0xb0
> > [    0.001000] Code: 00 85 c9 74 2d 0f 31 c1 e1 0a 48 c1 e2 20 41 89 cf 4c 
> > 03 7c 24 08 48 09 d0 49 29 c7 4c 89 3c 24 48 83 3c 24 00 0f 8f 8f fe ff
> > ff <0f> ff e9 10 ff ff ff 48 83 2c 24 01 eb e7 48 83 c4 18 5b 5d 41
> > [    0.001000] ---[ end trace b88e71b9a6ebebdd ]---
> > [    0.001000] masked ExtINT on CPU#0
> >
> > To fix this, just break down disable_IO_APIC(), then call
> > clear_IO_APIC() to stop IO_APIC where disable_IO_APIC() was called,
> > and call restore_boot_irq_mode() to restore boot irq mode before
> > reboot or kexec/kdump jump.
> 
> Two things here.
> a) This is missing a fixes tag and a CC stable.
> b) What makes your change to the KEXEC_JUMP code path safe?
>    Have the lapic and ioapic already been shut down?
> 
> The KEXEC_JUMP changes to machine_kexec_32.c and machine_kexec_64.c
> either need to be documented in the change long why they are safe
> so that this change becomes obviously safe and correct.
> 
> Otherwise we risk and trivial and obvious looking change causing another
> regression like changing the order of lapic_shutdown and disable_IOAPIC
> did.

Makes sense. Will change as you suggested and repost. Thanks a lot.

Thanks
Baoquan

> > ---
> >  arch/x86/include/asm/io_apic.h     | 1 +
> >  arch/x86/kernel/apic/io_apic.c     | 2 +-
> >  arch/x86/kernel/crash.c            | 3 ++-
> >  arch/x86/kernel/machine_kexec_32.c | 2 +-
> >  arch/x86/kernel/machine_kexec_64.c | 2 +-
> >  arch/x86/kernel/reboot.c           | 3 ++-
> >  6 files changed, 8 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
> > index 558d1a6a13ad..0fa95bfacb39 100644
> > --- a/arch/x86/include/asm/io_apic.h
> > +++ b/arch/x86/include/asm/io_apic.h
> > @@ -193,6 +193,7 @@ static inline unsigned int io_apic_read(unsigned int 
> > apic, unsigned int reg)
> >  extern void setup_IO_APIC(void);
> >  extern void enable_IO_APIC(void);
> >  extern void disable_IO_APIC(void);
> > +extern void clear_IO_APIC(void);
> >  extern void restore_boot_irq_mode(void);
> >  extern int IO_APIC_get_PCI_irq_vector(int bus, int devfn, int pin);
> >  extern void print_IO_APICs(void);
> > diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
> > index 7b73b6b9b4b6..2d7cd2db77f5 100644
> > --- a/arch/x86/kernel/apic/io_apic.c
> > +++ b/arch/x86/kernel/apic/io_apic.c
> > @@ -587,7 +587,7 @@ static void clear_IO_APIC_pin(unsigned int apic, 
> > unsigned int pin)
> >                    mpc_ioapic_id(apic), pin);
> >  }
> >  
> > -static void clear_IO_APIC (void)
> > +void clear_IO_APIC (void)
> >  {
> >     int apic, pin;
> >  
> > diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> > index 10e74d4778a1..1f6680427ff0 100644
> > --- a/arch/x86/kernel/crash.c
> > +++ b/arch/x86/kernel/crash.c
> > @@ -199,9 +199,10 @@ void native_machine_crash_shutdown(struct pt_regs 
> > *regs)
> >  #ifdef CONFIG_X86_IO_APIC
> >     /* Prevent crash_kexec() from deadlocking on ioapic_lock. */
> >     ioapic_zap_locks();
> > -   disable_IO_APIC();
> > +   clear_IO_APIC();
> >  #endif
> >     lapic_shutdown();
> > +   restore_boot_irq_mode();
> >  #ifdef CONFIG_HPET_TIMER
> >     hpet_disable();
> >  #endif
> > diff --git a/arch/x86/kernel/machine_kexec_32.c 
> > b/arch/x86/kernel/machine_kexec_32.c
> > index edfede768688..f78bb4432bfb 100644
> > --- a/arch/x86/kernel/machine_kexec_32.c
> > +++ b/arch/x86/kernel/machine_kexec_32.c
> > @@ -199,7 +199,7 @@ void machine_kexec(struct kimage *image)
> >              * one form or other. kexec jump path also need
> >              * one.
> >              */
> > -           disable_IO_APIC();
> > +           restore_boot_irq_mode();
> >  #endif
> >     }
> >  
> > diff --git a/arch/x86/kernel/machine_kexec_64.c 
> > b/arch/x86/kernel/machine_kexec_64.c
> > index 1f790cf9d38f..cb0c2d0a4c99 100644
> > --- a/arch/x86/kernel/machine_kexec_64.c
> > +++ b/arch/x86/kernel/machine_kexec_64.c
> > @@ -297,7 +297,7 @@ void machine_kexec(struct kimage *image)
> >              * one form or other. kexec jump path also need
> >              * one.
> >              */
> > -           disable_IO_APIC();
> > +           restore_boot_irq_mode();
> >  #endif
> >     }
> >  
> > diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
> > index 2126b9d27c34..725624b6c0c0 100644
> > --- a/arch/x86/kernel/reboot.c
> > +++ b/arch/x86/kernel/reboot.c
> > @@ -666,7 +666,7 @@ void native_machine_shutdown(void)
> >      * Even without the erratum, it still makes sense to quiet IO APIC
> >      * before disabling Local APIC.
> >      */
> > -   disable_IO_APIC();
> > +   clear_IO_APIC();
> >  #endif
> >  
> >  #ifdef CONFIG_SMP
> > @@ -680,6 +680,7 @@ void native_machine_shutdown(void)
> >  #endif
> >  
> >     lapic_shutdown();
> > +   restore_boot_irq_mode();
> >  
> >  #ifdef CONFIG_HPET_TIMER
> >     hpet_disable();

Reply via email to