On Thu Jul 1, 2021 at 1:12 AM CDT, Nicholas Piggin wrote: > Excerpts from Christopher M. Riedl's message of May 6, 2021 2:34 pm: > > When code patching a STRICT_KERNEL_RWX kernel the page containing the > > address to be patched is temporarily mapped as writeable. Currently, a > > per-cpu vmalloc patch area is used for this purpose. While the patch > > area is per-cpu, the temporary page mapping is inserted into the kernel > > page tables for the duration of patching. The mapping is exposed to CPUs > > other than the patching CPU - this is undesirable from a hardening > > perspective. Use a temporary mm instead which keeps the mapping local to > > the CPU doing the patching. > > > > Use the `poking_init` init hook to prepare a temporary mm and patching > > address. Initialize the temporary mm by copying the init mm. Choose a > > randomized patching address inside the temporary mm userspace address > > space. The patching address is randomized between PAGE_SIZE and > > DEFAULT_MAP_WINDOW-PAGE_SIZE. The upper limit is necessary due to how > > the Book3s64 Hash MMU operates - by default the space above > > DEFAULT_MAP_WINDOW is not available. For now, the patching address for > > all platforms/MMUs is randomized inside this range. The number of > > possible random addresses is dependent on PAGE_SIZE and limited by > > DEFAULT_MAP_WINDOW. > > > > Bits of entropy with 64K page size on BOOK3S_64: > > > > bits of entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE) > > > > PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB > > bits of entropy = log2(128TB / 64K) bits of entropy = 31 > > > > Randomization occurs only once during initialization at boot. > > > > Introduce two new functions, map_patch() and unmap_patch(), to > > respectively create and remove the temporary mapping with write > > permissions at patching_addr. The Hash MMU on Book3s64 requires mapping > > the page for patching with PAGE_SHARED since the kernel cannot access > > userspace pages with the PAGE_PRIVILEGED (PAGE_KERNEL) bit set. > > > > Also introduce hash_prefault_mapping() to preload the SLB entry and HPTE > > for the patching_addr when using the Hash MMU on Book3s64 to avoid > > taking an SLB and Hash fault during patching. > > What prevents the SLBE or HPTE from being removed before the last > access?
This code runs with local IRQs disabled - we also don't access anything else in userspace so I'm not sure what else could cause the entries to be removed TBH. > > > > +#ifdef CONFIG_PPC_BOOK3S_64 > > + > > +static inline int hash_prefault_mapping(pgprot_t pgprot) > > { > > - struct vm_struct *area; > > + int err; > > > > - area = get_vm_area(PAGE_SIZE, VM_ALLOC); > > - if (!area) { > > - WARN_ONCE(1, "Failed to create text area for cpu %d\n", > > - cpu); > > - return -1; > > - } > > - this_cpu_write(text_poke_area, area); > > + if (radix_enabled()) > > + return 0; > > > > - return 0; > > -} > > + err = slb_allocate_user(patching_mm, patching_addr); > > + if (err) > > + pr_warn("map patch: failed to allocate slb entry\n"); > > > > -static int text_area_cpu_down(unsigned int cpu) > > -{ > > - free_vm_area(this_cpu_read(text_poke_area)); > > - return 0; > > + err = hash_page_mm(patching_mm, patching_addr, pgprot_val(pgprot), 0, > > + HPTE_USE_KERNEL_KEY); > > + if (err) > > + pr_warn("map patch: failed to insert hashed page\n"); > > + > > + /* See comment in switch_slb() in mm/book3s64/slb.c */ > > + isync(); > > I'm not sure if this is enough. Could we context switch here? You've > got the PTL so no with a normal kernel but maybe yes with an RT kernel > How about taking an machine check that clears the SLB? Could the HPTE > get removed by something else here? All of this happens after a local_irq_save() which should at least prevent context switches IIUC. I am not sure what else could cause the HPTE to get removed here. > > You want to prevent faults because you might be patching a fault > handler? In a more general sense: I don't think we want to take page faults every time we patch an instruction with a STRICT_RWX kernel. The Hash MMU page fault handler codepath also checks `current->mm` in some places which won't match the temporary mm. Also `current->mm` can be NULL which caused problems in my earlier revisions of this series. > > Thanks, > Nick