Re: [PATCH 1/8] mm: use vm_unmapped_area() on parisc architecture
On 01/08/2013 08:28 PM, Michel Lespinasse wrote: Update the parisc arch_get_unmapped_area function to make use of vm_unmapped_area() instead of implementing a brute force search. Signed-off-by: Michel Lespinasse Acked-by: Rik van Riel ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/8] mm: use vm_unmapped_area() on alpha architecture
On 01/08/2013 08:28 PM, Michel Lespinasse wrote: Update the alpha arch_get_unmapped_area function to make use of vm_unmapped_area() instead of implementing a brute force search. Signed-off-by: Michel Lespinasse Acked-by: Rik van Riel ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/8] mm: use vm_unmapped_area() on frv architecture
On 01/08/2013 08:28 PM, Michel Lespinasse wrote: Update the frv arch_get_unmapped_area function to make use of vm_unmapped_area() instead of implementing a brute force search. Signed-off-by: Michel Lespinasse Acked-by: Rik van Riel ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 4/8] mm: use vm_unmapped_area() on ia64 architecture
On 01/08/2013 08:28 PM, Michel Lespinasse wrote: Update the ia64 arch_get_unmapped_area function to make use of vm_unmapped_area() instead of implementing a brute force search. Signed-off-by: Michel Lespinasse Acked-by: Rik van Riel ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 5/8] mm: use vm_unmapped_area() in hugetlbfs on ia64 architecture
On 01/08/2013 08:28 PM, Michel Lespinasse wrote: Update the ia64 hugetlb_get_unmapped_area function to make use of vm_unmapped_area() instead of implementing a brute force search. Signed-off-by: Michel Lespinasse Acked-by: Rik van Riel ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 6/8] mm: remove free_area_cache use in powerpc architecture
On 01/08/2013 08:28 PM, Michel Lespinasse wrote: As all other architectures have been converted to use vm_unmapped_area(), we are about to retire the free_area_cache. This change simply removes the use of that cache in slice_get_unmapped_area(), which will most certainly have a performance cost. Next one will convert that function to use the vm_unmapped_area() infrastructure and regain the performance. Signed-off-by: Michel Lespinasse Acked-by: Rik van Riel ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 7/8] mm: use vm_unmapped_area() on powerpc architecture
On 01/08/2013 08:28 PM, Michel Lespinasse wrote: Update the powerpc slice_get_unmapped_area function to make use of vm_unmapped_area() instead of implementing a brute force search. Signed-off-by: Michel Lespinasse Acked-by: Rik van Riel ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 8/8] mm: remove free_area_cache
On 01/08/2013 08:28 PM, Michel Lespinasse wrote: Since all architectures have been converted to use vm_unmapped_area(), there is no remaining use for the free_area_cache. Signed-off-by: Michel Lespinasse Yay Acked-by: Rik van Riel ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 7/8] mm: use vm_unmapped_area() on powerpc architecture
On 01/09/2013 06:23 AM, Michel Lespinasse wrote: On Wed, Jan 09, 2013 at 02:32:56PM +1100, Benjamin Herrenschmidt wrote: Ok. I think at least you can move that construct: + if (addr < SLICE_LOW_TOP) { + slice = GET_LOW_SLICE_INDEX(addr); + addr = (slice + 1) << SLICE_LOW_SHIFT; + if (!(available.low_slices & (1u << slice))) + continue; + } else { + slice = GET_HIGH_SLICE_INDEX(addr); + addr = (slice + 1) << SLICE_HIGH_SHIFT; + if (!(available.high_slices & (1u << slice))) + continue; + } Into some kind of helper. It will probably compile to the same thing but at least it's more readable and it will avoid a fuckup in the future if somebody changes the algorithm and forgets to update one of the copies :-) All right, does the following look more palatable then ? (didn't re-test it, though) Looks equivalent. I have also not tested :) Signed-off-by: Michel Lespinasse Acked-by: Rik van Riel ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH -V2 3/5] mm: Move change_prot_numa outside CONFIG_ARCH_USES_NUMA_PROT_NONE
On 12/03/2013 10:13 PM, Benjamin Herrenschmidt wrote: On Mon, 2013-11-18 at 14:58 +0530, Aneesh Kumar K.V wrote: From: "Aneesh Kumar K.V" change_prot_numa should work even if _PAGE_NUMA != _PAGE_PROTNONE. On archs like ppc64 that don't use _PAGE_PROTNONE and also have a separate page table outside linux pagetable, we just need to make sure that when calling change_prot_numa we flush the hardware page table entry so that next page access result in a numa fault. That patch doesn't look right... At first glance, indeed... You are essentially making change_prot_numa() do whatever it does (which I don't completely understand) *for all architectures* now, whether they have CONFIG_ARCH_USES_NUMA_PROT_NONE or not ... So because you want that behaviour on powerpc book3s64, you change everybody. However, it appears that since the code was #ifdefed like that, the called code was made generic enough, that change_prot_numa should actually work for everything. In other words: Reviewed-by: Rik van Riel ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH -V2 3/5] mm: Move change_prot_numa outside CONFIG_ARCH_USES_NUMA_PROT_NONE
On 12/05/2013 12:20 AM, Benjamin Herrenschmidt wrote: On Thu, 2013-12-05 at 10:48 +0530, Aneesh Kumar K.V wrote: Ok, I can move the changes below #ifdef CONFIG_NUMA_BALANCING ? We call change_prot_numa from task_numa_work and queue_pages_range(). The later may be an issue. So doing the below will help ? -#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE +#ifdef CONFIG_NUMA_BALANCING I will defer to Mel and Rik (should we also CC Andrea ?) It looks like manual numa binding can also use lazy page migration, but I am not sure if that can happen without CONFIG_NUMA_BALANCING, or if mbind always uses MPOL_MF_STRICT... ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH -V3] mm: Move change_prot_numa outside CONFIG_ARCH_USES_NUMA_PROT_NONE
On 12/05/2013 01:38 PM, Aneesh Kumar K.V wrote: From: "Aneesh Kumar K.V" change_prot_numa should work even if _PAGE_NUMA != _PAGE_PROTNONE. On archs like ppc64 that don't use _PAGE_PROTNONE and also have a separate page table outside linux pagetable, we just need to make sure that when calling change_prot_numa we flush the hardware page table entry so that next page access result in a numa fault. We still need to make sure we use the numa faulting logic only when CONFIG_NUMA_BALANCING is set. This implies the migrate-on-fault (Lazy migration) via mbind will only work if CONFIG_NUMA_BALANCING is set. Signed-off-by: Aneesh Kumar K.V Reviewed-by: Rik van Riel ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/3] mm: dirty accountable change only apply to non prot numa case
On 02/11/2014 05:34 AM, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > So move it within the if loop > > Signed-off-by: Aneesh Kumar K.V Reviewed-by: Rik van Riel -- All rights reversed ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/3] mm: Use ptep/pmdp_set_numa for updating _PAGE_NUMA bit
On 02/11/2014 05:34 AM, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Archs like ppc64 doesn't do tlb flush in set_pte/pmd functions. ppc64 also > doesn't implement > flush_tlb_range. ppc64 require the tlb flushing to be batched within ptl > locks. The reason > to do that is to ensure that the hash page table is in sync with linux page > table. > We track the hpte index in linux pte and if we clear them without flushing > hash and drop the > ptl lock, we can have another cpu update the pte and can end up with double > hash. We also want > to keep set_pte_at simpler by not requiring them to do hash flush for > performance reason. > Hence cannot use them while updating _PAGE_NUMA bit. Add new functions for > marking pte/pmd numa > > Signed-off-by: Aneesh Kumar K.V Reviewed-by: Rik van Riel -- All rights reversed ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/3] powerpc: mm: Add new set flag argument to pte/pmd update function
On 02/11/2014 05:34 AM, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > We will use this later to set the _PAGE_NUMA bit. > > Signed-off-by: Aneesh Kumar K.V Acked-by: Rik van Riel -- All rights reversed ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/mm: Make sure a local_irq_disable prevent a parallel THP split
On 03/15/2014 06:47 AM, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > We have generic code like the one in get_futex_key that assume that > a local_irq_disable prevents a parallel THP split. Support that by > adding a dummy smp call function after setting _PAGE_SPLITTING. Code > paths like get_user_pages_fast still need to check for _PAGE_SPLITTING > after disabling IRQ which indicate that a parallel THP splitting is > ongoing. Now if they don't find _PAGE_SPLITTING set, then we can be > sure that parallel split will now block in pmdp_splitting flush > until we enables IRQ > > Signed-off-by: Aneesh Kumar K.V Acked-by: Rik van Riel -- All rights reversed ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 4/7] x86,tlb: make lazy TLB mode lazier
On Fri, 2018-07-20 at 10:30 +0200, Peter Zijlstra wrote: > On Thu, Jul 19, 2018 at 10:04:09AM -0700, Andy Lutomirski wrote: > > I added some more arch maintainers. The idea here is that, on x86 > > at > > least, task->active_mm and all its refcounting is pure > > overhead. When > > a process exits, __mmput() gets called, but the core kernel has a > > longstanding "optimization" in which other tasks (kernel threads > > and > > idle tasks) may have ->active_mm pointing at this mm. This is > > nasty, > > complicated, and hurts performance on large systems, since it > > requires > > extra atomic operations whenever a CPU switches between real users > > threads and idle/kernel threads. > > > > It's also almost completely worthless on x86 at least, since > > __mmput() > > frees pagetables, and that operation *already* forces a remote TLB > > flush, so we might as well zap all the active_mm references at the > > same time. > > So I disagree that active_mm is complicated (the code is less than > ideal > but that is actually fixable). And aside from the process exit case, > it > does avoid CR3 writes when switching between user and kernel threads > (which can be far more often than exit if you have longer running > tasks). > > Now agreed, recent x86 work has made that less important. > > And I of course also agree that not doing those refcount atomics is > better. It might be cleaner to keep the ->active_mm pointer in place for now (at least in the first patch), even on architectures where we end up dropping the refcounting. That way the code is more similar everywhere, and we just get rid of the expensive instructions. Let me try coding this up... -- All Rights Reversed. signature.asc Description: This is a digitally signed message part
Re: [MOCKUP] x86/mm: Lightweight lazy mm refcounting
On Thu, 2020-12-03 at 12:31 +, Matthew Wilcox wrote: > And this just makes me think RCU freeing of mm_struct. I'm sure it's > more complicated than that (then, or now), but if an anonymous > process > is borrowing a freed mm, and the mm is freed by RCU then it will not > go > away until the task context switches. When we context switch back to > the anon task, it'll borrow some other task's MM and won't even > notice > that the MM it was using has gone away. One major complication here is that most of the active_mm borrowing is done by the idle task, but RCU does not wait for idle tasks to context switch. That means RCU, as it is today, is not a mechanism that mm_struct freeing could just piggyback off. -- All Rights Reversed. signature.asc Description: This is a digitally signed message part
Re: [PATCH 1/9] mm: Hardened usercopy
On Wed, 2016-07-06 at 15:25 -0700, Kees Cook wrote: > This is the start of porting PAX_USERCOPY into the mainline kernel. > This > is the first set of features, controlled by CONFIG_HARDENED_USERCOPY. > The > work is based on code by PaX Team and Brad Spengler, and an earlier > port > from Casey Schaufler. Additional non-slab page tests are from Rik van > Riel. Feel free to add my S-O-B for the code I wrote. The rest looks good, too. There may be some room for optimization later on, by putting the most likely branches first, annotating with likely/unlikely, etc, but I suspect the less likely checks are already towards the ends of the functions. Signed-off-by: Rik van Riel > This patch contains the logic for validating several conditions when > performing copy_to_user() and copy_from_user() on the kernel object > being copied to/from: > - address range doesn't wrap around > - address range isn't NULL or zero-allocated (with a non-zero copy > size) > - if on the slab allocator: > - object size must be less than or equal to copy size (when check > is > implemented in the allocator, which appear in subsequent patches) > - otherwise, object must not span page allocations > - if on the stack > - object must not extend before/after the current process task > - object must be contained by the current stack frame (when there > is > arch/build support for identifying stack frames) > - object must not overlap with kernel text > > Signed-off-by: Kees Cook > --- > arch/Kconfig| 7 ++ > include/linux/slab.h| 12 +++ > include/linux/thread_info.h | 15 +++ > mm/Makefile | 4 + > mm/usercopy.c | 239 > > security/Kconfig| 27 + > 6 files changed, 304 insertions(+) > create mode 100644 mm/usercopy.c > > diff --git a/arch/Kconfig b/arch/Kconfig > index d794384a0404..3ea04d8dcf62 100644 > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -424,6 +424,13 @@ config CC_STACKPROTECTOR_STRONG > > endchoice > > +config HAVE_ARCH_LINEAR_KERNEL_MAPPING > + bool > + help > + An architecture should select this if it has a secondary > linear > + mapping of the kernel text. This is used to verify that > kernel > + text exposures are not visible under > CONFIG_HARDENED_USERCOPY. > + > config HAVE_CONTEXT_TRACKING > bool > help > diff --git a/include/linux/slab.h b/include/linux/slab.h > index aeb3e6d00a66..96a16a3fb7cb 100644 > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -155,6 +155,18 @@ void kfree(const void *); > void kzfree(const void *); > size_t ksize(const void *); > > +#ifdef CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR > +const char *__check_heap_object(const void *ptr, unsigned long n, > + struct page *page); > +#else > +static inline const char *__check_heap_object(const void *ptr, > + unsigned long n, > + struct page *page) > +{ > + return NULL; > +} > +#endif > + > /* > * Some archs want to perform DMA into kmalloc caches and need a > guaranteed > * alignment larger than the alignment of a 64-bit integer. > diff --git a/include/linux/thread_info.h > b/include/linux/thread_info.h > index b4c2a485b28a..a02200db9c33 100644 > --- a/include/linux/thread_info.h > +++ b/include/linux/thread_info.h > @@ -146,6 +146,21 @@ static inline bool > test_and_clear_restore_sigmask(void) > #error "no set_restore_sigmask() provided and default one won't > work" > #endif > > +#ifdef CONFIG_HARDENED_USERCOPY > +extern void __check_object_size(const void *ptr, unsigned long n, > + bool to_user); > + > +static inline void check_object_size(const void *ptr, unsigned long > n, > + bool to_user) > +{ > + __check_object_size(ptr, n, to_user); > +} > +#else > +static inline void check_object_size(const void *ptr, unsigned long > n, > + bool to_user) > +{ } > +#endif /* CONFIG_HARDENED_USERCOPY */ > + > #endif /* __KERNEL__ */ > > #endif /* _LINUX_THREAD_INFO_H */ > diff --git a/mm/Makefile b/mm/Makefile > index 78c6f7dedb83..32d37247c7e5 100644 > --- a/mm/Makefile > +++ b/mm/Makefile > @@ -21,6 +21,9 @@ KCOV_INSTRUMENT_memcontrol.o := n > KCOV_INSTRUMENT_mmzone.o := n > KCOV_INSTRUMENT_vmstat.o := n > > +# Since __builtin_frame_address does work as used, disable the > warning. > +CFLAGS_usercopy.o += $(c
Re: [PATCH 1/9] mm: Hardened usercopy
On Wed, 2016-07-06 at 15:25 -0700, Kees Cook wrote: > > + /* Allow kernel rodata region (if not marked as Reserved). > */ > + if (ptr >= (const void *)__start_rodata && > + end <= (const void *)__end_rodata) > + return NULL; > One comment here. __check_object_size gets "to_user" as an argument. It may make sense to pass that to check_heap_object, and only allow copy_to_user from rodata, never copy_from_user, since that section should be read only. > +void __check_object_size(const void *ptr, unsigned long n, bool > to_user) > +{ > -- All Rights Reversed. signature.asc Description: This is a digitally signed message part ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/9] mm: Hardened usercopy
On Fri, 2016-07-08 at 19:22 -0700, Laura Abbott wrote: > > Even with the SLUB fixup I'm still seeing this blow up on my arm64 > system. This is a > Fedora rawhide kernel + the patches > > [0.666700] usercopy: kernel memory exposure attempt detected from > fc0008b4dd58 () (8 bytes) > [0.666720] CPU: 2 PID: 79 Comm: modprobe Tainted: > GW 4.7.0-0.rc6.git1.1.hardenedusercopy.fc25.aarch64 #1 > [0.666733] Hardware name: AppliedMicro Mustang/Mustang, BIOS > 1.1.0 Nov 24 2015 > [0.666744] Call trace: > [0.666756] [] dump_backtrace+0x0/0x1e8 > [0.666765] [] show_stack+0x24/0x30 > [0.666775] [] dump_stack+0xa4/0xe0 > [0.666785] [] __check_object_size+0x6c/0x230 > [0.666795] [] create_elf_tables+0x74/0x420 > [0.666805] [] load_elf_binary+0x828/0xb70 > [0.666814] [] search_binary_handler+0xb4/0x240 > [0.666823] [] do_execveat_common+0x63c/0x950 > [0.666832] [] do_execve+0x3c/0x50 > [0.666841] [] > call_usermodehelper_exec_async+0xe8/0x148 > [0.666850] [] ret_from_fork+0x10/0x50 > > This happens on every call to execve. This seems to be the first > copy_to_user in > create_elf_tables. I didn't get a chance to debug and I'm going out > of town > all of next week so all I have is the report unfortunately. config > attached. That's odd, this should be copying a piece of kernel data (not text) to userspace. from fs/binfmt_elf.c const char *k_platform = ELF_PLATFORM; ... size_t len = strlen(k_platform) + 1; u_platform = (elf_addr_t __user *)STACK_ALLOC(p, len); if (__copy_to_user(u_platform, k_platform, len)) return -EFAULT; from arch/arm/include/asm/elf.h: #define ELF_PLATFORM_SIZE 8 #define ELF_PLATFORM(elf_platform) extern char elf_platform[]; from arch/arm/kernel/setup.c: char elf_platform[ELF_PLATFORM_SIZE]; EXPORT_SYMBOL(elf_platform); ... snprintf(elf_platform, ELF_PLATFORM_SIZE, "%s%c", list->elf_name, ENDIANNESS); How does that end up in the .text section of the image, instead of in one of the various data sections? What kind of linker oddity is going on with ARM? -- All Rights Reversed. signature.asc Description: This is a digitally signed message part ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 02/11] mm: Hardened usercopy
On Fri, 2016-07-15 at 09:20 +1000, Balbir Singh wrote: > > == > > + ((unsigned long)end & (unsigned > > long)PAGE_MASK))) > > + return NULL; > > + > > + /* Allow if start and end are inside the same compound > > page. */ > > + endpage = virt_to_head_page(end); > > + if (likely(endpage == page)) > > + return NULL; > > + > > + /* Allow special areas, device memory, and sometimes > > kernel data. */ > > + if (PageReserved(page) && PageReserved(endpage)) > > + return NULL; > > If we came here, it's likely that endpage > page, do we need to check > that only the first and last pages are reserved? What about the ones > in > the middle? I think this will be so rare, we can get away with just checking the beginning and the end. -- All Rights Reversed. signature.asc Description: This is a digitally signed message part ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 00/11] mm: Hardened usercopy
On Wed, 2016-07-20 at 16:02 +, David Laight wrote: > From: Kees Cook > > Sent: 20 July 2016 16:32 > ... > > Yup: that's exactly what it's doing: walking up the stack. :) > > Remind me to make sure all our customers run kernels with it > disabled. You want a single copy_from_user to write to data in multiple stack frames? -- All Rights Reversed. signature.asc Description: This is a digitally signed message part ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v4 12/12] mm: SLUB hardened usercopy support
On Mon, 2016-07-25 at 12:16 -0700, Laura Abbott wrote: > On 07/20/2016 01:27 PM, Kees Cook wrote: > > Under CONFIG_HARDENED_USERCOPY, this adds object size checking to > > the > > SLUB allocator to catch any copies that may span objects. Includes > > a > > redzone handling fix discovered by Michael Ellerman. > > > > Based on code from PaX and grsecurity. > > > > Signed-off-by: Kees Cook > > Tested-by: Michael Ellerman > > --- > > init/Kconfig | 1 + > > mm/slub.c| 36 > > 2 files changed, 37 insertions(+) > > > > diff --git a/init/Kconfig b/init/Kconfig > > index 798c2020ee7c..1c4711819dfd 100644 > > --- a/init/Kconfig > > +++ b/init/Kconfig > > @@ -1765,6 +1765,7 @@ config SLAB > > > > config SLUB > > bool "SLUB (Unqueued Allocator)" > > + select HAVE_HARDENED_USERCOPY_ALLOCATOR > > help > > SLUB is a slab allocator that minimizes cache line > > usage > > instead of managing queues of cached objects (SLAB > > approach). > > diff --git a/mm/slub.c b/mm/slub.c > > index 825ff4505336..7dee3d9a5843 100644 > > --- a/mm/slub.c > > +++ b/mm/slub.c > > @@ -3614,6 +3614,42 @@ void *__kmalloc_node(size_t size, gfp_t > > flags, int node) > > EXPORT_SYMBOL(__kmalloc_node); > > #endif > > > > +#ifdef CONFIG_HARDENED_USERCOPY > > +/* > > + * Rejects objects that are incorrectly sized. > > + * > > + * Returns NULL if check passes, otherwise const char * to name of > > cache > > + * to indicate an error. > > + */ > > +const char *__check_heap_object(const void *ptr, unsigned long n, > > + struct page *page) > > +{ > > + struct kmem_cache *s; > > + unsigned long offset; > > + size_t object_size; > > + > > + /* Find object and usable object size. */ > > + s = page->slab_cache; > > + object_size = slab_ksize(s); > > + > > + /* Find offset within object. */ > > + offset = (ptr - page_address(page)) % s->size; > > + > > + /* Adjust for redzone and reject if within the redzone. */ > > + if (kmem_cache_debug(s) && s->flags & SLAB_RED_ZONE) { > > + if (offset < s->red_left_pad) > > + return s->name; > > + offset -= s->red_left_pad; > > + } > > + > > + /* Allow address range falling entirely within object > > size. */ > > + if (offset <= object_size && n <= object_size - offset) > > + return NULL; > > + > > + return s->name; > > +} > > +#endif /* CONFIG_HARDENED_USERCOPY */ > > + > > I compared this against what check_valid_pointer does for SLUB_DEBUG > checking. I was hoping we could utilize that function to avoid > duplication but a) __check_heap_object needs to allow accesses > anywhere > in the object, not just the beginning b) accessing page->objects > is racy without the addition of locking in SLUB_DEBUG. > > Still, the ptr < page_address(page) check from __check_heap_object > would > be good to add to avoid generating garbage large offsets and trying > to > infer C math. > > diff --git a/mm/slub.c b/mm/slub.c > index 7dee3d9..5370e4f 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -3632,6 +3632,9 @@ const char *__check_heap_object(const void > *ptr, unsigned long n, > s = page->slab_cache; > object_size = slab_ksize(s); > > + if (ptr < page_address(page)) > + return s->name; > + > /* Find offset within object. */ > offset = (ptr - page_address(page)) % s->size; > I don't get it, isn't that already guaranteed because we look for the page that ptr is in, before __check_heap_object is called? Specifically, in patch 3/12: + page = virt_to_head_page(ptr); + + /* Check slab allocator for flags and size. */ + if (PageSlab(page)) + return __check_heap_object(ptr, n, page); How can that generate a ptr that is not inside the page? What am I overlooking? And, should it be in the changelog or a comment? :) -- All Rights Reversed. signature.asc Description: This is a digitally signed message part ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v4 12/12] mm: SLUB hardened usercopy support
On Mon, 2016-07-25 at 16:29 -0700, Laura Abbott wrote: > On 07/25/2016 02:42 PM, Rik van Riel wrote: > > On Mon, 2016-07-25 at 12:16 -0700, Laura Abbott wrote: > > > On 07/20/2016 01:27 PM, Kees Cook wrote: > > > > Under CONFIG_HARDENED_USERCOPY, this adds object size checking > > > > to > > > > the > > > > SLUB allocator to catch any copies that may span objects. > > > > Includes > > > > a > > > > redzone handling fix discovered by Michael Ellerman. > > > > > > > > Based on code from PaX and grsecurity. > > > > > > > > Signed-off-by: Kees Cook > > > > Tested-by: Michael Ellerman > > > > --- > > > > init/Kconfig | 1 + > > > > mm/slub.c| 36 > > > > 2 files changed, 37 insertions(+) > > > > > > > > diff --git a/init/Kconfig b/init/Kconfig > > > > index 798c2020ee7c..1c4711819dfd 100644 > > > > --- a/init/Kconfig > > > > +++ b/init/Kconfig > > > > @@ -1765,6 +1765,7 @@ config SLAB > > > > > > > > config SLUB > > > > bool "SLUB (Unqueued Allocator)" > > > > + select HAVE_HARDENED_USERCOPY_ALLOCATOR > > > > help > > > > SLUB is a slab allocator that minimizes cache line > > > > usage > > > > instead of managing queues of cached objects (SLAB > > > > approach). > > > > diff --git a/mm/slub.c b/mm/slub.c > > > > index 825ff4505336..7dee3d9a5843 100644 > > > > --- a/mm/slub.c > > > > +++ b/mm/slub.c > > > > @@ -3614,6 +3614,42 @@ void *__kmalloc_node(size_t size, gfp_t > > > > flags, int node) > > > > EXPORT_SYMBOL(__kmalloc_node); > > > > #endif > > > > > > > > +#ifdef CONFIG_HARDENED_USERCOPY > > > > +/* > > > > + * Rejects objects that are incorrectly sized. > > > > + * > > > > + * Returns NULL if check passes, otherwise const char * to > > > > name of > > > > cache > > > > + * to indicate an error. > > > > + */ > > > > +const char *__check_heap_object(const void *ptr, unsigned long > > > > n, > > > > + struct page *page) > > > > +{ > > > > + struct kmem_cache *s; > > > > + unsigned long offset; > > > > + size_t object_size; > > > > + > > > > + /* Find object and usable object size. */ > > > > + s = page->slab_cache; > > > > + object_size = slab_ksize(s); > > > > + > > > > + /* Find offset within object. */ > > > > + offset = (ptr - page_address(page)) % s->size; > > > > + > > > > + /* Adjust for redzone and reject if within the > > > > redzone. */ > > > > + if (kmem_cache_debug(s) && s->flags & SLAB_RED_ZONE) { > > > > + if (offset < s->red_left_pad) > > > > + return s->name; > > > > + offset -= s->red_left_pad; > > > > + } > > > > + > > > > + /* Allow address range falling entirely within object > > > > size. */ > > > > + if (offset <= object_size && n <= object_size - > > > > offset) > > > > + return NULL; > > > > + > > > > + return s->name; > > > > +} > > > > +#endif /* CONFIG_HARDENED_USERCOPY */ > > > > + > > > > > > I compared this against what check_valid_pointer does for > > > SLUB_DEBUG > > > checking. I was hoping we could utilize that function to avoid > > > duplication but a) __check_heap_object needs to allow accesses > > > anywhere > > > in the object, not just the beginning b) accessing page->objects > > > is racy without the addition of locking in SLUB_DEBUG. > > > > > > Still, the ptr < page_address(page) check from > > > __check_heap_object > > > would > > > be good to add to avoid generating garbage large offsets and > > > trying > > > to > > > infer C math. > > > > > > diff --git a/mm/slub.c b/mm/slub.c > > > index 7dee3d9..5370e4f 100644 > > > --- a/mm/slub.c > > > +++ b/mm/slu