While running cpuhotplug + reboot test, I can easily hit a IPANIC on kernel 
3.14.

[  106.107851] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000040
[  106.116702] IP:
[  106.118490]  [<ffffffff810044f6>] 
check_irq_vectors_for_cpu_disable+0x76/0x180
[  106.126809] PGD 0
[  106.129110] Oops: 0000 [#1] PREEMPT SMP
[  106.133613] Modules linked in: atomisp_css2401a0_v21 lm3554 ov2722 
hid_sensor_hub sens_col_core hid_heci_ish heci_ish heci vidt_driver rfkill_gpi  
   o bcmdhd_pcie(O) cfg80211 ov5693 videobuf_vmalloc pn544_nfc(C) videobuf_core 
bt_lpm 6lowpan_iphc ip6table_raw iptable_raw atmel_mxt_ts
[  106.161897] CPU: 2 PID: 18 Comm: migration/2 Tainted: G        WC O 
3.14.37-x86_64-L1-R467-g68db82c #1
[  106.172323] Hardware name: Intel Corporation CHERRYVIEW C0 PLATFORM/Cherry 
Trail FFD, BIOS CH2TFFD.X64.0004.R83.1506171149 06/17/2015
[  106.185758] task: ffff880077e98510 ti: ffff880077e9a000 task.ti: 
ffff880077e9a000
[  106.194143] RIP: 0010:[<ffffffff810044f6>]
[  106.198646]  [<ffffffff810044f6>] 
check_irq_vectors_for_cpu_disable+0x76/0x180
[  106.206969] RSP: 0000:ffff880077e9bcf8  EFLAGS: 00010046
[  106.212926] RAX: 0000000000000000 RBX: 00000000000000d3 RCX: 0000000000000000
[  106.220921] RDX: 0000000000000000 RSI: 0000000000000088 RDI: 0000000000000001
[  106.228918] RBP: ffff880077e9bd28 R08: 0000000000000000 R09: ffff8800784008e0
[  106.236915] R10: 000000000000000a R11: 0000000000000000 R12: 0000000000000015
[  106.244911] R13: 0000000000000000 R14: 0000000000000088 R15: 0000000000000002
[  106.252898] FS:  0000000000000000(0000) GS:ffff88007a300000(0000) 
knlGS:0000000000000000
[  106.261961] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  106.268405] CR2: 0000000000000040 CR3: 000000006e03c000 CR4: 00000000001007e0
[  106.276400] Last Branch Records:
[  106.280052]    to: [<ffffffff819f3840>] page_fault+0x0/0x80
[  106.286335]  from: [<ffffffff810044f6>] 
check_irq_vectors_for_cpu_disable+0x76/0x180
[  106.295044]    to: [<ffffffff810044f3>] 
check_irq_vectors_for_cpu_disable+0x73/0x180
[  106.303749]  from: [<ffffffff810da4c8>] irq_to_desc+0x18/0x20
[  106.310227]    to: [<ffffffff810da4c7>] irq_to_desc+0x17/0x20
[  106.316701]  from: [<ffffffff8137347c>] radix_tree_lookup+0xc/0x10
[  106.323655]    to: [<ffffffff8137347b>] radix_tree_lookup+0xb/0x10
[  106.330616]  from: [<ffffffff81373425>] radix_tree_lookup_element+0x55/0x90
[  106.338450]    to: [<ffffffff81373400>] radix_tree_lookup_element+0x30/0x90
[  106.346274]  from: [<ffffffff81373420>] radix_tree_lookup_element+0x50/0x90
[  106.354108]    to: [<ffffffff8137340b>] radix_tree_lookup_element+0x3b/0x90
[  106.361934]  from: [<ffffffff813733fd>] radix_tree_lookup_element+0x2d/0x90
[  106.369750]    to: [<ffffffff813733d0>] radix_tree_lookup_element+0x0/0x90
[  106.377487]  from: [<ffffffff81373476>] radix_tree_lookup+0x6/0x10
[  106.384447]    to: [<ffffffff81373470>] radix_tree_lookup+0x0/0x10
[  106.391408]  from: [<ffffffff810da4c2>] irq_to_desc+0x12/0x20
[  106.397882] Stack:
[  106.400140]  00000002810bb4f2 ffff88006e07bde8 ffff88006e07bd88 
0000000000000000
[  106.408551]  ffffffff8110a801 0000000000000202 ffff880077e9bd40 
ffffffff81030f62
[  106.416967]  0000000000000282 ffff880077e9bd58 ffffffff819dd043 
0000000000000003
[  106.425375] Call Trace:
[  106.428136]  [<ffffffff8110a801>] ? multi_cpu_stop+0x1/0x110
[  106.434475]  [<ffffffff81030f62>] native_cpu_disable+0x12/0x40
[  106.441018]  [<ffffffff819dd043>] take_cpu_down+0x13/0x40
[  106.447074]  [<ffffffff8110a8c1>] multi_cpu_stop+0xc1/0x110
[  106.453324]  [<ffffffff8110a800>] ? cpu_stop_should_run+0x50/0x50
[  106.460156]  [<ffffffff8110aad8>] cpu_stopper_thread+0x78/0x150
[  106.466795]  [<ffffffff819f2bde>] ? _raw_spin_unlock_irq+0x1e/0x40
[  106.473726]  [<ffffffff810b33d7>] ? finish_task_switch+0x57/0xd0
[  106.480464]  [<ffffffff819edffe>] ? __schedule+0x37e/0x7b0
[  106.486619]  [<ffffffff810b20fd>] smpboot_thread_fn+0x17d/0x2b0
[  106.493259]  [<ffffffff810b1f80>] ? SyS_setgroups+0x160/0x160
[  106.499704]  [<ffffffff810ab1a4>] kthread+0xe4/0x100

We find latest upstream has commit d97eb8966c91f2c9d05f0a22eb89ed5b76d966d1 to
solve this IPANIC. But from the link 
http://lkml.kernel.org/r/20150204132754.ga10...@suse.de,
it looks the root cause is not clear.

As it's easily to hit with the specific test case, we have more check and find
the IPANIC scenario as below.

      cpu N (N = 1, or 2, or 3)                     cpu 0
        native_cpu_up                           device_shutdown
        => do_boot_cpu
        => start_secondary
                => smp_callin
                 => setup_vector_irq
                 => __setup_vector_irq
                                                        => free_msi_irqs
                                                        => 
arch_teardown_msi_irqs
                                                        => 
default_teardown_msi_irqs
                                                        => arch_teardown_msi_irq
                                                        => 
native_teardown_msi_irq
                                                        => destroy_irq
                                                        => __clear_irq_vector
                => set_cpu_online

The cpu still is not online when clear irq vector, it makes the irq number 
remain
in irq vector after free_msi_irqs. Next native_cpu_disable() will hit NULL 
pointer
when check irq vector.

The patch move setup_vector_irq after set_cpu_online.

Signed-off-by: xiao jin <jin.x...@intel.com>
---
 arch/x86/kernel/smpboot.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 50e547e..f7d5d79 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -172,11 +172,6 @@ static void smp_callin(void)
        apic_ap_setup();
 
        /*
-        * Need to setup vector mappings before we enable interrupts.
-        */
-       setup_vector_irq(smp_processor_id());
-
-       /*
         * Save our processor parameters. Note: this information
         * is needed for clock calibration.
         */
@@ -257,6 +252,11 @@ static void notrace start_secondary(void *unused)
        cpu_set_state_online(smp_processor_id());
        x86_platform.nmi_init();
 
+       /*
+        * Need to setup vector mappings before we enable interrupts.
+        */
+       setup_vector_irq(smp_processor_id());
+
        /* enable local interrupts */
        local_irq_enable();
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to