Le 17/09/2022 à 20:38, Samuel Holland a écrit : > On 9/17/22 03:16, Christophe Leroy wrote: >> Le 16/09/2022 à 07:05, Samuel Holland a écrit : >>> With CONFIG_PREEMPT=y (involuntary preemption enabled), it is possible >>> to switch away from a task inside copy_{from,to}_user. This left the CPU >>> with userspace access enabled until after the next IRQ or privilege >>> level switch, when AMR/IAMR got reset to AMR_KU[AE]P_BLOCKED. Then, when >>> switching back to the original task, the userspace access would fault: >> >> This is not supposed to happen. You never switch away from a task >> magically. Task switch will always happen in an interrupt, that means >> copy_{from,to}_user() get interrupted. > > That makes sense, the interrupt handler is responsible for saving the > KUAP status. It looks like neither DEFINE_INTERRUPT_HANDLER_RAW nor any > of its users (performance_monitor_exception(), do_slb_fault()) do that.
As far as I can see, that's done earlier, in exceptions-64s.S. Look for kuap_save_amr_and_lock. Now, it may be possible that one of the exceptions pathes misses it. > Yet they still call one of the interrupt_return variants, which restores > the status from the stack. > >> Whenever an interrupt is taken, kuap_save_amr_and_lock() macro is used >> to save KUAP status into the stack then lock KUAP access. At interrupt >> exit, kuap_kernel_restore() macro or function is used to restore KUAP >> access from the stack. At the time the task switch happens, KUAP access >> is expected to be locked. During task switch, the stack is switched so >> the KUAP status is taken back from the new task's stack. > > What if another task calls schedule() from kernel process context, and > the scheduler switches to a task that had been preempted inside > copy_{from,to}_user()? Then there is no interrupt involved, and I don't > see where kuap_kernel_restore() would get called. Yes there is interrupt involved. That task, if it has been preempted inside copy_from_user(), it must have been through an interrupt, likely a timer interrupt. So switching back to that task means doing an interrupt return in the context of that task. That's when KUAP status should be restored. > >> Your fix suggests that there is some path where the KUAP status is not >> properly saved and/or restored. Did you try running with >> CONFIG_PPC_KUAP_DEBUG ? It should warn whenever a KUAP access is left >> unlocked. >> >>> >>> Kernel attempted to write user page (3fff7ab68190) - exploit attempt? >>> (uid: 65536) >>> ------------[ cut here ]------------ >>> Bug: Write fault blocked by KUAP! >>> WARNING: CPU: 56 PID: 4939 at arch/powerpc/mm/fault.c:228 >>> ___do_page_fault+0x7b4/0xaa0 >>> CPU: 56 PID: 4939 Comm: git Tainted: G W >>> 5.19.8-00005-gba424747260d #1 >>> NIP: c0000000000555e4 LR: c0000000000555e0 CTR: c00000000079d9d0 >>> REGS: c00000008f507370 TRAP: 0700 Tainted: G W >>> (5.19.8-00005-gba424747260d) >>> MSR: 9000000000021033 <SF,HV,ME,IR,DR,RI,LE> CR: 28042222 XER: >>> 20040000 >>> CFAR: c000000000123780 IRQMASK: 3 >>> NIP [c0000000000555e4] ___do_page_fault+0x7b4/0xaa0 >>> LR [c0000000000555e0] ___do_page_fault+0x7b0/0xaa0 >>> Call Trace: >>> [c00000008f507610] [c0000000000555e0] ___do_page_fault+0x7b0/0xaa0 >>> (unreliable) >>> [c00000008f5076c0] [c000000000055938] do_page_fault+0x68/0x130 >>> [c00000008f5076f0] [c000000000008914] >>> data_access_common_virt+0x194/0x1f0 >>> --- interrupt: 300 at __copy_tofrom_user_base+0x9c/0x5a4 >> >> ... >> >>> >>> Fix this by saving and restoring the kernel-side AMR/IAMR values when >>> switching tasks. >> >> As explained above, KUAP access should be locked at that time, so saving >> and restoring it should not have any effect. If it does, it means >> something goes wrong somewhere else. >> >>> >>> Fixes: 890274c2dc4c ("powerpc/64s: Implement KUAP for Radix MMU") >>> Signed-off-by: Samuel Holland <sam...@sholland.org> >>> --- >>> I have no idea if this is the right change to make, and it could be >>> optimized, but my system has been stable with this patch for 5 days now. >>> >>> Without the patch, I hit the bug every few minutes when my load average >>> is <1, and I hit it immediately if I try to do a parallel kernel build. >> >> Great, then can you make a try with CONFIG_PPC_KUAP_DEBUG ? > > Yes, I will try this out in the next few days. > Thanks Christophe