hi Mikey, On 11/19/18 9:30 PM, Michael Neuling wrote: > On Mon, 2018-11-19 at 10:44 -0200, Breno Leitao wrote: >> On a signal handler return, the user could set a context with MSR[TS] bits >> set, and these bits would be copied to task regs->msr. >> >> At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set, >> several __get_user() are called and then a recheckpoint is executed. > > Do we have the same problem on the entry side? There are a bunch of > copy_from_user() before we do a tm_reclaim() (ie when still transactional). Is > there some chance of the same problem there?
Do you mean in this part of code? SYSCALL_DEFINE0(rt_sigreturn) { .... if (__copy_from_user(&set, &uc->uc_sigmask, sizeof(set))) goto badframe; ... if (MSR_TM_SUSPENDED(mfmsr())) tm_reclaim_current(0); I suspect it is not a problem, since a page fault here will cause the process to be reschedule, and tm-reclaim and tm_recheckpoint will happen, and save and restore the checkpointed/live registers before this tm_reclaim_current(0). I also forced a process reschedule (calling schedule()) just after copy_from_user() and I didn't see it breaking anything. I might spent more timing thinking about this whole get_user() at signal handlers and I will double check it. >> This is a problem since a page fault (in kernel space) could happen when >> calling __get_user(). If it happens, the process MSR[TS] bits were >> already set, but recheckpoint was not executed, and SPRs are still invalid. >> >> The page fault can cause the current process to be de-scheduled, with >> MSR[TS] active and without tm_recheckpoint() being called. More >> importantly, without TEXAR[FS] bit set also. > > This patch is great and should go in ASAP > >> Since TEXASR might not have the FS bit set, and when the process is >> scheduled back, it will try to reclaim, which will be aborted because of >> the CPU is not in the suspended state, and, then, recheckpoint. This >> recheckpoint will restore thread->texasr into TEXASR SPR, which might be >> zero, hitting a BUG_ON(). >> >> [ 2181.457997] kernel BUG at arch/powerpc/kernel/tm.S:446! > > Can you put more of the backtrace here? Can be useful for people searching > for > a similar problem. Ack! > >> This patch simply delays the MSR[TS] set, so, if there is any page fault in >> the __get_user() section, it does not have regs->msr[TS] set, since the TM >> structures are still invalid, thus avoiding doing TM operations for >> in-kernel exceptions and possible process reschedule. > > Can you put a comment in the code what says after the MSR[TS] setting, there > can > be no get/put_user before the recheckpoint? Sure. I will circle this code with a preempt_disable/enable() and put a comment there as well. > Also, it looks like the 32bit signals case is safe, but please add a comment > in > there too. Yes, 32-bits seems to be safe, but, I will do the same as above, adding preempt disable/enable and warn about possible interrupts in-between. Thank you! Breno