On Wed, Mar 31, 2021 at 10:16 PM Michael Ellerman <m...@ellerman.id.au> wrote: > > Hi Jordan, > > A few nits below ... > > Jordan Niethe <jniet...@gmail.com> writes: > > From: Russell Currey <rus...@russell.cc> > > > > The set_memory_{ro/rw/nx/x}() functions are required for STRICT_MODULE_RWX, > > and are generally useful primitives to have. This implementation is > > designed to be completely generic across powerpc's many MMUs. > > > > It's possible that this could be optimised to be faster for specific > > MMUs, but the focus is on having a generic and safe implementation for > > now. > > > > This implementation does not handle cases where the caller is attempting > > to change the mapping of the page it is executing from, or if another > > CPU is concurrently using the page being altered. These cases likely > > shouldn't happen, but a more complex implementation with MMU-specific code > > could safely handle them, so that is left as a TODO for now. > > > > On hash the linear mapping is not kept in the linux pagetable, so this > > will not change the protection if used on that range. Currently these > > functions are not used on the linear map so just WARN for now. > > > > These functions do nothing if STRICT_KERNEL_RWX is not enabled. > > > > Reviewed-by: Daniel Axtens <d...@axtens.net> > > Signed-off-by: Russell Currey <rus...@russell.cc> > > Signed-off-by: Christophe Leroy <christophe.le...@csgroup.eu> > > [jpn: -rebase on next plus "powerpc/mm/64s: Allow STRICT_KERNEL_RWX again" > > - WARN on hash linear map] > > Signed-off-by: Jordan Niethe <jniet...@gmail.com> > > --- > > v10: WARN if trying to change the hash linear map > > --- > > arch/powerpc/Kconfig | 1 + > > arch/powerpc/include/asm/set_memory.h | 32 ++++++++++ > > arch/powerpc/mm/Makefile | 2 +- > > arch/powerpc/mm/pageattr.c | 88 +++++++++++++++++++++++++++ > > 4 files changed, 122 insertions(+), 1 deletion(-) > > create mode 100644 arch/powerpc/include/asm/set_memory.h > > create mode 100644 arch/powerpc/mm/pageattr.c > > > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > > index fc7f5c5933e6..4498a27ac9db 100644 > > --- a/arch/powerpc/Kconfig > > +++ b/arch/powerpc/Kconfig > > @@ -135,6 +135,7 @@ config PPC > > select ARCH_HAS_MEMBARRIER_CALLBACKS > > select ARCH_HAS_MEMBARRIER_SYNC_CORE > > select ARCH_HAS_SCALED_CPUTIME if VIRT_CPU_ACCOUNTING_NATIVE > > && PPC_BOOK3S_64 > > + select ARCH_HAS_SET_MEMORY > > Below you do: > > if (!IS_ENABLED(CONFIG_STRICT_KERNEL_RWX)) > return 0; > > Which suggests we should instead just only select ARCH_HAS_SET_MEMORY if > STRICT_KERNEL_RWX ? Yeah, I'm just going to do that. > > > > diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile > > index 3b4e9e4e25ea..d8a08abde1ae 100644 > > --- a/arch/powerpc/mm/Makefile > > +++ b/arch/powerpc/mm/Makefile > > @@ -5,7 +5,7 @@ > > > > ccflags-$(CONFIG_PPC64) := $(NO_MINIMAL_TOC) > > > > -obj-y := fault.o mem.o pgtable.o mmap.o > > maccess.o \ > > +obj-y := fault.o mem.o pgtable.o mmap.o > > maccess.o pageattr.o \ > > .. and then the file should only be built if ARCH_HAS_SET_MEMORY = y. > > > init_$(BITS).o pgtable_$(BITS).o \ > > pgtable-frag.o ioremap.o ioremap_$(BITS).o > > \ > > init-common.o mmu_context.o drmem.o > > diff --git a/arch/powerpc/mm/pageattr.c b/arch/powerpc/mm/pageattr.c > > new file mode 100644 > > index 000000000000..9efcb01088da > > --- /dev/null > > +++ b/arch/powerpc/mm/pageattr.c > > @@ -0,0 +1,88 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > + > > +/* > > + * MMU-generic set_memory implementation for powerpc > > + * > > + * Copyright 2019, IBM Corporation. > > Should be 2019-2021. Right. > > > + */ > > + > > +#include <linux/mm.h> > > +#include <linux/set_memory.h> > > + > > +#include <asm/mmu.h> > > +#include <asm/page.h> > > +#include <asm/pgtable.h> > > + > > + > > +/* > > + * Updates the attributes of a page in three steps: > > + * > > + * 1. invalidate the page table entry > > + * 2. flush the TLB > > + * 3. install the new entry with the updated attributes > > + * > > + * This is unsafe if the caller is attempting to change the mapping of the > > + * page it is executing from, or if another CPU is concurrently using the > > + * page being altered. > > Is the 2nd part of that statement true? > > Or, I guess maybe it is true depending on what "unsafe" means. > > AIUI it's unsafe to use this on the page you're executing from, and by > unsafe we mean the kernel will potentially crash because it will lose > the mapping for the currently executing text. > > Using this on a page that another CPU is accessing could be safe, if eg. > the other CPU is reading from the page and we are just changing it from > RW->RO. > > So I'm not sure they're the same type of "unsafe".
I think the comment was prompted by your message here: https://lore.kernel.org/linuxppc-dev/87pnio5fva....@mpe.ellerman.id.au/ So I'll rewrite the comment to separate the two cases and indicate the 2nd case only might be an issue. > > > + * TODO make the implementation resistant to this. > > + * > > + * NOTE: can be dangerous to call without STRICT_KERNEL_RWX > > I don't think we need that anymore? No we don't, change_memory_attr() won't call it without STRICT_KERNEL_RWX. > > > + */ > > +static int change_page_attr(pte_t *ptep, unsigned long addr, void *data) > > +{ > > + long action = (long)data; > > + pte_t pte; > > + > > + spin_lock(&init_mm.page_table_lock); > > + > > + /* invalidate the PTE so it's safe to modify */ > > + pte = ptep_get_and_clear(&init_mm, addr, ptep); > > + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); > > + > > + /* modify the PTE bits as desired, then apply */ > > + switch (action) { > > + case SET_MEMORY_RO: > > + pte = pte_wrprotect(pte); > > + break; > > So set_memory_ro() removes write, but doesn't remove execute. > > That doesn't match my mental model of what "set to ro" means, but I > guess I'm wrong because the other implementations seem to do something > similar. Hm, looking at arm and riscv it does seem to make it just RO. > > > > + case SET_MEMORY_RW: > > + pte = pte_mkwrite(pte); > > I think we want to add pte_mkdirty() here also to avoid a fault when the > mapping is written to. Right. > > eg. pmd_mkwrite(pmd_mkdirty(pte)); > > > + break; > > + case SET_MEMORY_NX: > > + pte = pte_exprotect(pte); > > + break; > > + case SET_MEMORY_X: > > + pte = pte_mkexec(pte); > > + break; > > + default: > > + WARN_ON_ONCE(1); > > + break; > > + } > > + > > + set_pte_at(&init_mm, addr, ptep, pte); > > + spin_unlock(&init_mm.page_table_lock); > > + > > + return 0; > > +} > > + > > +int change_memory_attr(unsigned long addr, int numpages, long action) > > +{ > > + unsigned long start = ALIGN_DOWN(addr, PAGE_SIZE); > > + unsigned long sz = numpages * PAGE_SIZE; > > + > > + if (!IS_ENABLED(CONFIG_STRICT_KERNEL_RWX)) > > + return 0; > > + > > + if (numpages <= 0) > > + return 0; > > + > > This ↓ should have a comment explaining what it's doing: Sure. > > > +#ifdef CONFIG_PPC_BOOK3S_64 > > + if (WARN_ON_ONCE(!radix_enabled() && > > + get_region_id(addr) == LINEAR_MAP_REGION_ID)) { > > + return -1; > > + } > > +#endif > > Maybe: As Chrisophe says, we can't do that because those symbols aren't defined for !CONFIG_PPC_BOOK3S_64. > > if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && > WARN_ON_ONCE(!radix_enabled() && get_region_id(addr) == > LINEAR_MAP_REGION_ID)) { > return -1; > } > > But then Aneesh pointed out that we should also block VMEMMAP_REGION_ID. > > It might be better to just check for the permitted regions. That would probably work better. > > if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && !radix_enabled()) { > int region = get_region_id(addr); > > if (WARN_ON_ONCE(region != VMALLOC_REGION_ID && region != > IO_REGION_ID)) > return -1; > } > > > + > > + return apply_to_existing_page_range(&init_mm, start, sz, > > + change_page_attr, (void *)action); > > +} > > > cheers