Hi Thomas, I've been chasing down a use-after-free on an irq_desc structure during repeated device removal testing that crashes 4.3 in smp_irq_move_cleanup_interrupt. So far I have a bunch of crashes and some data gleaned from instrumenting the kernel with trace logging. Details to follow, but in short, I think I'm hitting the use-after-free because of a782a7e46bb5 "x86/irq: Store irq descriptor in vector array".
During the tests, I usually see RCU stalls (sometimes a crash later) -- when looking at a vmcore, the CPU stall is here: crash> bt ffff881038ee4200 PID: 0 TASK: ffff881038ee4200 CPU: 1 COMMAND: "swapper/1" [exception RIP: _raw_spin_lock+0x10] RIP: ffffffff81672a10 RSP: ffff88103f843f70 RFLAGS: 00000046 RAX: 0000000000000000 RBX: 0000000000000091 RCX: 0000000000000001 RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffff81d36a88 RBP: ffff88103f843fa8 R8: 0000000000000101 R9: 0000000000000f78 R10: 0000000000100000 R11: ffff8810384ccf8c R12: 0000000000000091 R13: ffff88102a819794 R14: 000000000000b5e8 R15: ffff88102a8196f8 CS: 0010 SS: 0018 #0 [ffff88103f843f70] smp_irq_move_cleanup_interrupt at ffffffff8105319b #1 [ffff88103f843fb0] irq_move_cleanup_interrupt at ffffffff81673a04 --- <IRQ stack> --- #2 [ffff881038f0fd90] irq_move_cleanup_interrupt at ffffffff81673a04 [exception RIP: unknown or invalid address] RIP: 000000000000001d RSP: 0000000000000000 RFLAGS: ffff881038f10000 RAX: 0000000000000001 RBX: ffff881038f0c000 RCX: 0000000000000005 RDX: 00000000000a087a RSI: 0000000000051ca2 RDI: 0000000000000018 RBP: ffff88103f84f100 R8: ffff881038f0fea0 R9: ffffe8f000042378 R10: 0000000000000002 R11: 000000ee954efd02 R12: ffffffff810e5060 R13: ffff881038f0fdc8 R14: ffff88103f84f500 R15: ffff88103f84f100 ORIG_RAX: ffff88103f856c40 CS: 8000a044 SS: ffffffffffffffdf bt: WARNING: possibly bogus exception frame #3 [ffff881038f0fe38] cpuidle_enter_state at ffffffff8151e278 #4 [ffff881038f0fea8] cpuidle_enter at ffffffff8151e417 #5 [ffff881038f0feb8] call_cpuidle at ffffffff810bef42 #6 [ffff881038f0fed0] cpu_startup_entry at ffffffff810bf1dc #7 [ffff881038f0ff28] start_secondary at ffffffff8104e9e0 The irq_desc is in R15: ffff88102a8196f8 crash> kmem ffff88102a8196f8 CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE ffff88103f408c80 kmalloc-512 512 5292 5460 140 32k SLAB MEMORY NODE TOTAL ALLOCATED FREE ffffea0040aa0600 ffff88102a818000 0 39 17 22 FREE / [ALLOCATED] ffff88102a8196f8 PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffea0040aa0640 102a819000 0 0 0 2fffff80008000 tail This irq_desc is no longer allocated, it's been filled with the slub debug poison pattern (hence the spinlock is stuck): crash> rd ffff88102a8196f8 0x28 ffff88102a8196f8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk ffff88102a819708: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk ffff88102a819718: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk ffff88102a819728: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk ffff88102a819738: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk ffff88102a819748: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk ffff88102a819758: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk ffff88102a819768: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk ffff88102a819778: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk ffff88102a819788: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk ffff88102a819798: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk ffff88102a8197a8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk ffff88102a8197b8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk ffff88102a8197c8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk ffff88102a8197d8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk ffff88102a8197e8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk ffff88102a8197f8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk ffff88102a819808: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk ffff88102a819818: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk ffff88102a819828: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk The irq vector is in RBX: 0000000000000091 crash> p/d 0x91 $1 = 145 The irq_desc_tree entry for vector 0x91 appears to have been updated to something new (the device re-add): crash> tree -t radix irq_desc_tree -s irq_desc | grep 'irq = 0x91' -B11 | grep -e '^ffff' -e 'irq =' ffff88102fb144e8 irq = 0x91, But from a custom crash gdb script, the freed irq_desc can still be found in CPU 1's vector_irq[145]: cpu[1] vector_irq[145] irq_desc @ 0xffff88102a8196f8 Sifting through a heavily grepped and abbreviated trace log: - The irq moved from CPU 1, vector 145 to CPU 44, vector 134 - Both CPUs skip cleaning up their vector_irq[] entries for this irq because data->move_in_progress is set (is this normal?) - The irq moves again to CPU 34, vector 123 - The underlying device is removed @ 593.106514 jiffies - The irq_desc is freed @ 593.239879 jiffies - CPU 1 is last heard from @ 1022.830083 jiffies [001] 22.936764: __assign_irq_vector : cpu 44 : vector=134 -> 0xffff88102a8196f8 [044] 35.209945: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=134 0xffff88102a8196f8 [001] 35.212370: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8 [044] 35.674114: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=134 0xffff88102a8196f8 [001] 35.675395: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8 [044] 36.265716: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=134 0xffff88102a8196f8 [044] 36.517785: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=134 0xffff88102a8196f8 [001] 36.520115: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8 [001] 54.636651: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8 [001] 54.636722: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8 [044] 61.670267: __assign_irq_vector : cpu 34 : vector=123 -> 0xffff88102a8196f8 [001] 61.670979: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8 [044] 61.696120: smp_irq_move_cleanup_interrupt : cpu (this) : vector=134 0xffff88102a8196f8 -> (nil) [034] 69.086107: smp_irq_move_cleanup_interrupt : vector == data->cfg.vector && ... : vector=123 0xffff88102a8196f8 [000] 593.239811: clear_irq_vector : 1 : cpu 34 : vector=123 0xffff88102a8196f8 -> (nil) [000] 593.239879: free_desc : desc @ 0xffff88102a8196f8 [001] 1022.830083: smp_irq_move_cleanup_interrupt : IS_ERR_OR_NULL : vector=144 (nil) crash> log | grep 'mpt3sas[0-9]*: _scsih_remove$' [ 291.658410] mpt3sas0: _scsih_remove [ 593.106514] mpt3sas1: _scsih_remove << free_desc [ 874.345239] mpt3sas3: _scsih_remove Prior to a782a7e46bb5 "x86/irq: Store irq descriptor in vector array", the vector_irq array held irq values, those interested in the irq_desc would have to perform their own irq_to_desc() (ie, a radix_tree_lookup of the irq_desc_tree). The radix tree is updated in free_desc(), so any subsequent lookups would fail. After the change though, I'm not so sure that direct access to the irq_desc is safe -- I couldn't figure out anything preventing use-after-free (the sparse_irq_lock and the vector_lock seem to protect their respective data-structures, but not the dual-use of the irq_desc's by both the irq_desc_tree and the per-cpu vector_irq[].) Hopefully this is on the right track -- I can modify the trace logging or run other tests if there is additional information that would be helpful. Regards, -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/