Hi Ben,

On 08/29/2017 06:55 PM, Benjamin Herrenschmidt wrote:
On Tue, 2017-08-29 at 17:43 -0300, Daniel Henrique Barboza wrote:
Hi,

This is a scenario I've been facing when working in early device
hotplugs in QEMU. When a device is added, a IRQ pulse is fired to warn
the guest of the event, then the kernel fetches it by calling
'check_exception' and handles it. If the hotplug is done too early
(before SLOF, for example), the pulse is ignored and the hotplug event
is left unchecked in the events queue.

One solution would be to pulse the hotplug queue interrupt after CAS,
when we are sure that the hotplug queue is negotiated. However, this
panics the kernel with sig 11 kernel access of bad area, which suggests
that the kernel wasn't quite ready to handle it.
That's not right. This is a bug that needs fixing. The interrupt should
be masked anyway but still.

Tell us more about the crash (backtrace etc...)  this definitely needs
fixing.

This is the backtrace using a 4.13.0-rc3 guest:

---------
[ 0.008913] Unable to handle kernel paging request for data at address 0x00000100
[    0.008989] Faulting instruction address: 0xc00000000012c318
[    0.009046] Oops: Kernel access of bad area, sig: 11 [#1]
[    0.009092] SMP NR_CPUS=1024
[    0.009092] NUMA
[    0.009128] pSeries
[    0.009173] Modules linked in:
[    0.009210] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc3+ #1
[    0.009268] task: c0000000feb02580 task.stack: c0000000fe108000
[ 0.009325] NIP: c00000000012c318 LR: c00000000012c9c4 CTR: 0000000000000000
[    0.009394] REGS: c0000000fffef910 TRAP: 0380   Not tainted (4.13.0-rc3+)
[    0.009450] MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>
[    0.009454]   CR: 28000822  XER: 20000000
[    0.009554] CFAR: c00000000012c9c0 SOFTE: 0
[ 0.009554] GPR00: c00000000012c9c4 c0000000fffefb90 c00000000141f100 0000000000000400 [ 0.009554] GPR04: 0000000000000000 c0000000fe1851c0 0000000000000000 00000000fee60000 [ 0.009554] GPR08: 0000000fffffffe1 0000000000000000 0000000000000001 0000000002001001 [ 0.009554] GPR12: 0000000000000040 c00000000fd80000 c00000000000db58 0000000000000000 [ 0.009554] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 0.009554] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000001 [ 0.009554] GPR24: 0000000000000002 0000000000000013 c0000000fe14bc00 0000000000000400 [ 0.009554] GPR28: 0000000000000400 0000000000000000 c0000000fe1851c0 0000000000000001
[    0.010121] NIP [c00000000012c318] __queue_work+0x48/0x640
[    0.010168] LR [c00000000012c9c4] queue_work_on+0xb4/0xf0
[    0.010213] Call Trace:
[ 0.010239] [c0000000fffefb90] [c00000000000db58] kernel_init+0x8/0x160 (unreliable)
[    0.010308] [c0000000fffefc70] [c00000000012c9c4] queue_work_on+0xb4/0xf0
[ 0.010368] [c0000000fffefcb0] [c0000000000c4608] queue_hotplug_event+0xd8/0x150 [ 0.010435] [c0000000fffefd00] [c0000000000c30d0] ras_hotplug_interrupt+0x140/0x190 [ 0.010505] [c0000000fffefd90] [c00000000018c8b0] __handle_irq_event_percpu+0x90/0x310 [ 0.010573] [c0000000fffefe50] [c00000000018cb6c] handle_irq_event_percpu+0x3c/0x90 [ 0.010642] [c0000000fffefe90] [c00000000018cc24] handle_irq_event+0x64/0xc0 [ 0.010710] [c0000000fffefec0] [c0000000001928b0] handle_fasteoi_irq+0xc0/0x230 [ 0.010779] [c0000000fffefef0] [c00000000018ae14] generic_handle_irq+0x54/0x80
[    0.010847] [c0000000fffeff20] [c0000000000189f0] __do_irq+0x90/0x210
[    0.010904] [c0000000fffeff90] [c00000000002e730] call_do_irq+0x14/0x24
[    0.010961] [c0000000fe10b640] [c000000000018c10] do_IRQ+0xa0/0x130
[ 0.011021] [c0000000fe10b6a0] [c000000000008c58] hardware_interrupt_common+0x158/0x160
[    0.011090] --- interrupt: 501 at __replay_interrupt+0x38/0x3c
[    0.011090]     LR = arch_local_irq_restore+0x74/0x90
[ 0.011179] [c0000000fe10b990] [c0000000fe10b9e0] 0xc0000000fe10b9e0 (unreliable) [ 0.011249] [c0000000fe10b9b0] [c000000000b967fc] _raw_spin_unlock_irqrestore+0x4c/0xb0
[    0.011316] [c0000000fe10b9e0] [c00000000018ff50] __setup_irq+0x630/0x9e0
[ 0.011374] [c0000000fe10ba90] [c00000000019054c] request_threaded_irq+0x13c/0x250 [ 0.011441] [c0000000fe10baf0] [c0000000000c2cd0] request_event_sources_irqs+0x100/0x180 [ 0.011511] [c0000000fe10bc10] [c000000000eceda8] __machine_initcall_pseries_init_ras_IRQ+0xc4/0x12c [ 0.011591] [c0000000fe10bc40] [c00000000000d8c8] do_one_initcall+0x68/0x1e0 [ 0.011659] [c0000000fe10bd00] [c000000000eb4484] kernel_init_freeable+0x284/0x370
[    0.011725] [c0000000fe10bdc0] [c00000000000db7c] kernel_init+0x2c/0x160
[ 0.011782] [c0000000fe10be30] [c00000000000bc9c] ret_from_kernel_thread+0x5c/0xc0
[    0.011848] Instruction dump:
[ 0.011885] fbc1fff0 f8010010 f821ff21 7c7c1b78 7c9d2378 7cbe2b78 787b0020 60000000 [ 0.011955] 60000000 892d028a 2fa90000 409e04bc <813d0100> 75290001 408204c0 3d2061c8
[    0.012026] ---[ end trace e0b4d36daf3f8b2a ]---
[    0.013850]
[    2.013962] Kernel panic - not syncing: Fatal exception in interrupt
-------------

To reproduce it, what I did was to fire a pulse in the hotplug queue right after CAS by
hacking QEMU code.

However, this can also be reproduced without changing QEMU by simply hotpluging a
CPU/LMB after CAS using device_add.


[adding dgibson in CC in case he wants to comment]


Thanks,


Daniel


In my experiments using upstream 4.13 I saw that there is a 'safe time'
to pulse the queue, sometime after CAS and before mounting the root fs,
but I wasn't able to pinpoint it. From QEMU perspective, the last hcall
done (an h_set_mode) is still too early to pulse it and the kernel
panics. Looking at the kernel source I saw that the IRQ handling is
initiated quite early in the init process.

So my question (ok, actually 2 questions):

- Is my analysis correct? Is there an unsafe time to fire a IRQ pulse
before CAS that can break the kernel or am I overlooking/doing something
wrong?
- is there a reliable way to know when can the kernel safely handle the
hotplug interrupt?
So I don't think that's the right approach. Virtual interrutps are edge
sensitive and we will potentially lose them if they occur early. I
think what needs to happen is:

  - Fix whatever's causing the above crash

and

  - The hotplug code should check for pending events (check_exception ?)
at boot time to enqueue whatever's there. It needs to do that after
unmasking the interrupt and in a way that is protected from races with
said interrupt.

Cheers,
Ben.


Thanks,


Daniel

Reply via email to