hi Mikey,

On 11/19/18 9:30 PM, Michael Neuling wrote:
> On Mon, 2018-11-19 at 10:44 -0200, Breno Leitao wrote:
>> On a signal handler return, the user could set a context with MSR[TS] bits
>> set, and these bits would be copied to task regs->msr.
>>
>> At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set,
>> several __get_user() are called and then a recheckpoint is executed.
> 
> Do we have the same problem on the entry side?  There are a bunch of
> copy_from_user() before we do a tm_reclaim() (ie when still transactional). Is
> there some chance of the same problem there?

Do you mean in this part of code?

  SYSCALL_DEFINE0(rt_sigreturn)
  {
        ....
        if (__copy_from_user(&set, &uc->uc_sigmask, sizeof(set)))
                goto badframe;

        ...
        if (MSR_TM_SUSPENDED(mfmsr()))
                tm_reclaim_current(0);

I suspect it is not a problem, since a page fault here will cause the process
to be reschedule, and tm-reclaim and tm_recheckpoint will happen, and save
and restore the checkpointed/live registers before this tm_reclaim_current(0).

I also forced a process reschedule (calling schedule()) just after
copy_from_user() and I didn't see it breaking anything. I might spent more
timing thinking about this whole get_user() at signal handlers and I will
double check it.

>> This is a problem since a page fault (in kernel space) could happen when
>> calling __get_user(). If it happens, the process MSR[TS] bits were
>> already set, but recheckpoint was not executed, and SPRs are still invalid.
>>
>> The page fault can cause the current process to be de-scheduled, with
>> MSR[TS] active and without tm_recheckpoint() being called.  More
>> importantly, without TEXAR[FS] bit set also.
> 
> This patch is great and should go in ASAP 
> 
>> Since TEXASR might not have the FS bit set, and when the process is
>> scheduled back, it will try to reclaim, which will be aborted because of
>> the CPU is not in the suspended state, and, then, recheckpoint. This
>> recheckpoint will restore thread->texasr into TEXASR SPR, which might be
>> zero, hitting a BUG_ON().
>>
>>      [ 2181.457997] kernel BUG at arch/powerpc/kernel/tm.S:446!
> 
> Can you put more of the backtrace here?  Can be useful for people searching 
> for
> a similar problem.

Ack!

> 
>> This patch simply delays the MSR[TS] set, so, if there is any page fault in
>> the __get_user() section, it does not have regs->msr[TS] set, since the TM
>> structures are still invalid, thus avoiding doing TM operations for
>> in-kernel exceptions and possible process reschedule.
> 
> Can you put a comment in the code what says after the MSR[TS] setting, there 
> can
> be no get/put_user before the recheckpoint?

Sure. I will circle this code with a preempt_disable/enable() and put a
comment there as well.

> Also, it looks like the 32bit signals case is safe, but please add a comment 
> in
> there too.

Yes, 32-bits seems to be safe, but, I will do the same as above, adding
preempt disable/enable and warn about possible interrupts in-between.

Thank you!
Breno

Reply via email to