On Thursday 21 September 2017 10:25 PM, Naveen N . Rao wrote:
On 2017/09/21 09:00PM, Balbir Singh wrote:
On Thu, Sep 21, 2017 at 8:02 PM, Michael Ellerman <m...@ellerman.id.au> wrote:
Kamalesh Babulal <kamal...@linux.vnet.ibm.com> writes:
While running stress test with livepatch module loaded, kernel
bug was triggered.
cpu 0x5: Vector: 400 (Instruction Access) at [c0000000eb9d3b60]
pc: c0000000eb9d3e30
lr: c0000000eb9d3e30
sp: c0000000eb9d3de0
msr: 800000001280b033
current = 0xc0000000dbd38700
paca = 0xc00000000fe01400 softe: 0 irq_happened: 0x01
pid = 8618, comm = make
Linux version 4.13.0+ (root@ubuntu) (gcc version 6.3.0 20170406 (Ubuntu
6.3.0-12ubuntu2)) #1 SMP Wed Sep 13 03:49:27 EDT 2017
5:mon> t
[c0000000eb9d3de0] c0000000eb9d3e30 (unreliable)
[c0000000eb9d3e30] c000000000008ab4 hardware_interrupt_common+0x114/0x120
--- Exception: 501 (Hardware Interrupt) at c000000000053040
livepatch_handler+0x4c/0x74
[c0000000eb9d4120] 0000000057ac6e9d (unreliable)
[d0000000089d9f78] 2e0965747962382e
SP (965747962342e09) is in userspace
When an interrupt is served in between the livepatch_handler execution,
there are chances of the livepatch_stack/task task getting corrupted.
Ouch. That's pretty broken by me.
I was worried more about preemption as I said in the review comment earlier,
this is new. It looks like we restored the wrong r1 on returning from
the interrupt
context? It would be nice to see any pt_regs changes due to the interrupt.
Did the interrupt handling code called something that needed live-patching?
The problem is just that the livepatch stack grows up, rather than down.
So, when this stack is used during interrupt handling, we'll clobber
part of this stack.
There aren't any pt_regs changes due to this afaics. Just the livepatch
stack being over-written.
Fix the corruption by using r11 register for livepatch stack manipulation,
instead of shuffling task stack and livepatch stack into r1 register.
Using r11 register also avoids disabling/enabling irq's while setting
up the livepatch stack.
I'm trying to think if there's some reason I didn't use r11. But I can't
remember anything specific. I suspect I just didn't check the ABI
We can't use r11, this is ftrace with regs, we've restore registers before
calling livepatch_handler, I don't think we can clobber r11, but I might
be sleep deprived and missing something
r11 is volatile and meant for usage in function linkage. With
-mprofile-kernel, we've just entered the new function and haven't
touched r11 yet. So, it appears to me that we can definetely use/clobber
it, along with r0 and r12.
Misread the question from Balbir. livepatch_handler is called in mcount
context, which is very early in the function execution. r11 might
have live values used by the calling function, those are restored by
ftrace code but the called/calling function cannot rely on it, after
a function call as pointed out by Naveen.
In non-ftrace context, r11 is also used/clobbered between function calls
to load the stub address.
--
cheers,
Kamalesh.