Hi Mikey,

On 09/24/2018 04:27 AM, Michael Neuling wrote:
> When we treclaim we store the userspace checkpointed r13 to a scratch
> SPR and then later save the scratch SPR to the user thread struct.
> 
> Unfortunately, this doesn't work as accessing the user thread struct
> can take an SLB fault and the SLB fault handler will write the same
> scratch SPRG that now contains the userspace r13.
> 
> To fix this, we store r13 to the kernel stack (which can't fault)
> before we access the user thread struct.
> 
> Found by running P8 guest + powervm + disable_1tb_segments + TM. Seen
> as a random userspace segfault with r13 looking like a kernel address.
> 
> Signed-off-by: Michael Neuling <mi...@neuling.org>

Reviewed-by: Breno Leitao <lei...@debian.org>

I am wondering if the same problem could not happen with r1 as well, since r1
is kept in the paca->tm_scratch after MSR[RI] is enabled, in a code similar
to this:

        std     r1, PACATMSCRATCH(r13)
        ..
        li      r11, MSR_RI
        mtmsrd  r11, 1
        ....
        ld      r3, PACATMSCRATCH(r13)          /* user r1 */
        std     r3, GPR1(r7)

There might be a corruption if the exception that is raised just after
MSR[RI] is enabled somehow exits through 'fast_exception_return' (being
called by most of the exception handlers as I understand) which will
overwrite paca->tm_scratch with the upcoming MSR (aka SRR1) just before the
SRR1, as:

        ld      r3,_MSR(r1)

        ...
        std     r3, PACATMSCRATCH(r13) /* Stash returned-to MSR */


It seems that slb_miss_common and instruction_access_slb does not go through
this path, but handle_page_fault calls
ret_from_except_lite->fast_exception_return, which might cause this
'possible' issue.

Anyway, I would like to investigate if this problem might happen or not for
real, but I do not fully understand what are the exceptions that might be
raised when just MSR[RI] is enabled.

Reply via email to