[PATCH] hugetlb: follow_hugetlb_page for write access
When calling get_user_pages(), a write flag is passed in by the caller to indicate if write access is required on the faulted-in pages. Currently, follow_hugetlb_page() ignores this flag and always faults pages for read-only access. This can cause data corruption because a device driver that calls get_user_pages() with write set will not expect COW faults to occur on the returned pages. This patch passes the write flag down to follow_hugetlb_page() and makes sure hugetlb_fault() is called with the right write_access parameter. Signed-off-by: Adam Litke <[EMAIL PROTECTED]> --- include/linux/hugetlb.h |2 +- mm/hugetlb.c|5 +++-- mm/memory.c |2 +- 3 files changed, 5 insertions(+), 4 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 3a19b03..31fa0a0 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -19,7 +19,7 @@ static inline int is_vm_hugetlb_page(struct vm_area_struct *vma) int hugetlb_sysctl_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *); int hugetlb_treat_movable_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *); int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *); -int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, struct page **, struct vm_area_struct **, unsigned long *, int *, int); +int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, struct page **, struct vm_area_struct **, unsigned long *, int *, int, int); void unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long); void __unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long); int hugetlb_prefault(struct address_space *, struct vm_area_struct *); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index eab8c42..b645985 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -621,7 +621,8 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, struct page **pages, struct vm_area_struct **vmas, - unsigned long *position, int *length, int i) + unsigned long *position, int *length, int i, + int write) { unsigned long pfn_offset; unsigned long vaddr = *position; @@ -643,7 +644,7 @@ int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, int ret; spin_unlock(&mm->page_table_lock); - ret = hugetlb_fault(mm, vma, vaddr, 0); + ret = hugetlb_fault(mm, vma, vaddr, write); spin_lock(&mm->page_table_lock); if (!(ret & VM_FAULT_ERROR)) continue; diff --git a/mm/memory.c b/mm/memory.c index f82b359..1bcd444 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1039,7 +1039,7 @@ int get_user_pages(struct task_struct *tsk, struct mm_struct *mm, if (is_vm_hugetlb_page(vma)) { i = follow_hugetlb_page(mm, vma, pages, vmas, - &start, &len, i); + &start, &len, i, write); continue; } ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[Documentation] Page Table Layout diagrams
Hello all. In an effort to understand how the page tables are laid out across various architectures I put together some diagrams. I have posted them on the linux-mm wiki: http://linux-mm.org/PageTableStructure and I hope they will be useful to others. Just to make sure I am not spreading misinformation, could a few of you experts take a quick look at the three diagrams I've got finished so far and point out any errors I have made? Thanks. -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[RFC PATCH 0/2] Merge HUGETLB_PAGE and HUGETLBFS Kconfig options
There are currently two global Kconfig options that enable/disable the hugetlb code: CONFIG_HUGETLB_PAGE and CONFIG_HUGETLBFS. This may have made sense before hugetlbfs became ubiquitous but now the pair of options are redundant. Merging these two options into one will simplify the code slightly and will, more importantly, avoid confusion and questions like: Which hugetlbfs CONFIG option should my code depend on? CONFIG_HUGETLB_PAGE is aliased to the value of CONFIG_HUGETLBFS, so one option can be removed without any effect. The first patch merges the two options into one option: CONFIG_HUGETLB. The second patch updates the defconfigs to set the one new option appropriately. I have cross-compiled this on i386, x86_64, ia64, powerpc, sparc64 and sh with the option enabled and disabled. This is completely mechanical but, due to the large number of files affected (especially defconfigs), could do well with a review from several sets of eyeballs. Thanks. -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[RFC PATCH 1/2] Merge options into CONFIG_HUGETLB
Merge CONFIG_HUGETLB_PAGE and CONFIG_HUGETLBFS into one new config option: CONFIG_HUGETLB. CONFIG_HUGETLB_PAGE is aliased to the value of CONFIG_HUGETLBFS, so one option can be removed without any effect. This change is pretty mechanical, but a little extra verification from arch maintainers would be very helpful. Thanks. Signed-off-by: Adam Litke <[EMAIL PROTECTED]> -- Documentation/vm/hugetlbpage.txt |6 ++ arch/arm/mm/consistent.c |2 +- arch/avr32/mm/dma-coherent.c |2 +- arch/ia64/Kconfig |8 arch/ia64/kernel/ivt.S |6 +++--- arch/ia64/kernel/sys_ia64.c|2 +- arch/ia64/mm/Makefile |2 +- arch/ia64/mm/init.c|2 +- arch/powerpc/Kconfig |2 +- arch/powerpc/mm/Makefile |2 +- arch/powerpc/mm/hash_utils_64.c| 10 +- arch/powerpc/mm/init_64.c |2 +- arch/powerpc/mm/tlb_64.c |2 +- arch/powerpc/platforms/Kconfig.cputype |2 +- arch/s390/mm/Makefile |2 +- arch/sh/mm/Kconfig |2 +- arch/sh/mm/Makefile_32 |2 +- arch/sh/mm/Makefile_64 |2 +- arch/sparc64/Kconfig |2 +- arch/sparc64/kernel/sun4v_tlb_miss.S |2 +- arch/sparc64/kernel/tsb.S |4 ++-- arch/sparc64/mm/Makefile |2 +- arch/sparc64/mm/fault.c|4 ++-- arch/sparc64/mm/init.c |2 +- arch/sparc64/mm/tsb.c | 14 +++--- arch/x86/mm/Makefile |2 +- fs/Kconfig |5 + fs/Makefile|2 +- fs/hugetlbfs/Makefile |2 +- include/asm-ia64/mmu_context.h |2 +- include/asm-ia64/page.h|6 +++--- include/asm-ia64/pgtable.h |2 +- include/asm-mn10300/page.h |2 +- include/asm-parisc/page.h |2 +- include/asm-powerpc/mmu-hash64.h |4 ++-- include/asm-powerpc/page_64.h |6 +++--- include/asm-powerpc/pgtable-ppc64.h|2 +- include/asm-sh/page.h |2 +- include/asm-sparc64/mmu.h |2 +- include/asm-sparc64/mmu_context.h |2 +- include/asm-sparc64/page.h |2 +- include/asm-sparc64/pgtable.h |2 +- include/asm-x86/page_32.h |2 +- include/linux/hugetlb.h| 12 ++-- include/linux/pageblock-flags.h|6 +++--- include/linux/vmstat.h |2 +- kernel/sysctl.c|2 +- mm/Makefile|2 +- mm/mempolicy.c |4 ++-- mm/vmstat.c|2 +- 50 files changed, 81 insertions(+), 86 deletions(-) diff --git a/Documentation/vm/hugetlbpage.txt b/Documentation/vm/hugetlbpage.txt index 3102b81..53604e9 100644 --- a/Documentation/vm/hugetlbpage.txt +++ b/Documentation/vm/hugetlbpage.txt @@ -13,10 +13,8 @@ This optimization is more critical now as bigger and bigger physical memories Users can use the huge page support in Linux kernel by either using the mmap system call or standard SYSv shared memory system calls (shmget, shmat). -First the Linux kernel needs to be built with the CONFIG_HUGETLBFS -(present under "File systems") and CONFIG_HUGETLB_PAGE (selected -automatically when CONFIG_HUGETLBFS is selected) configuration -options. +First the Linux kernel needs to be built with the CONFIG_HUGETLB +(present under "File systems") configuration option. The kernel built with hugepage support should show the number of configured hugepages in the system by running the "cat /proc/meminfo" command. diff --git a/arch/arm/mm/consistent.c b/arch/arm/mm/consistent.c index 333a82a..5931192 100644 --- a/arch/arm/mm/consistent.c +++ b/arch/arm/mm/consistent.c @@ -140,7 +140,7 @@ static struct vm_region *vm_region_find(struct vm_region *head, unsigned long ad return c; } -#ifdef CONFIG_HUGETLB_PAGE +#ifdef CONFIG_HUGETLB #error ARM Coherent DMA allocator does not (yet) support huge TLB #endif diff --git a/arch/avr32/mm/dma-coherent.c b/arch/avr32/mm/dma-coherent.c index 6d8c794..0cf57c6 100644 --- a/arch/avr32/mm/dma-coherent.c +++ b/arch/avr32/mm/dma-coherent.c @@ -45,7 +45,7 @@ static struct page *__dma_alloc(struct device *dev, size_t size, * with __GFP_COMP being passed to split_page() which cannot * handle them. The real problem is that this flag probably * should be 0 on AVR32 as it is not supported on this -* platform--see CONFIG_HUGETLB_PAGE. */ +* platform--see CONFIG_HUGETLB. */ gfp &= ~(__GFP_COMP); size = PAGE_ALIGN(size); diff --git
Re: [RFC PATCH 0/2] Merge HUGETLB_PAGE and HUGETLBFS Kconfig options
On Fri, 2008-06-13 at 14:46 +0100, Ralf Baechle wrote: > MIPS doesn't do HUGETLB (at least not in-tree atm) so I'm not sure why > [EMAIL PROTECTED] was cc'ed at all. So feel free to add my > Couldnt-care-less: ack line ;-) Sorry :) My patches touched your defconfigs so I felt it prudent to include the mips list as an FYI. -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [RFC PATCH 2/2] Update defconfigs for CONFIG_HUGETLB
On Thu, 2008-06-12 at 22:36 +0300, Adrian Bunk wrote: > On Thu, Jun 12, 2008 at 02:55:45PM -0400, Adam Litke wrote: > > Update all defconfigs that specify a default configuration for hugetlbfs. > > There is now only one option: CONFIG_HUGETLB. Replace the old > > CONFIG_HUGETLB_PAGE and CONFIG_HUGETLBFS options with the new one. I found > > no > > cases where CONFIG_HUGETLBFS and CONFIG_HUGETLB_PAGE had different values so > > this patch is large but completely mechanical: > >... > > 335 files changed, 335 insertions(+), 385 deletions(-) > >... > > Please don't do this kind of patches - it doesn't bring any advantage > but can create tons of patch conflicts. > > The next time a defconfig gets updated it will anyway automatically be > fixed, and for defconfigs that aren't updated it doesn't create any > problems to keep them as they are today until they might one day get > updated. Thanks for taking a look. I am not sure if I have ever seen a defconfig patch hit the mailing list before and I was wondering how those changes happen. In any case I am perfectly happy to drop this huge patch and stick with just the first one. -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [Libhugetlbfs-devel] libbugetlbfs: Test case for powerpc huge_ptep_set_wrprotect() bug
On Mon, 2008-07-07 at 17:19 +1000, David Gibson wrote: > Until very recently (in fact, even now in mainline) powerpc kernels > had a bug in huge_ptep_set_wrprotect() which meant the 'huge' flag was > not passed down to pte_update() and hpte_need_flush(). This meant the > hash ptes for hugepages would not be correctly flushed on fork(), > allowing the parent to pollute the child's mapping after the fork(). > > This patch adds a testcase to libhugetlbfs for this behaviour, also > doing some other checking of the COW semantics over a fork(). > > Signed-off-by: David Gibson <[EMAIL PROTECTED]> Good test David, thanks... Acked-by: Adam Litke <[EMAIL PROTECTED]> -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [BUG] 2.6.25-rc3-mm1 kernel bug while running libhugetlbfs
On Tue, 2008-03-04 at 11:51 -0800, Andrew Morton wrote: > hugetlb-correct-page-count-for-surplus-huge-pages.patch adds: > > if (page) { > /* > * This page is now managed by the hugetlb allocator and has > * no users -- drop the buddy allocator's reference. > */ > int page_count = put_page_testzero(page); > BUG_ON(page_count != 0); > > Ugh I got bitten by put_page_testzero(). When it returns 1, the page count is zero (not the page count). My initial version had a BUG_ON() with side-effects. When a reviewer pointed it out, I thought I could fix the patch up on its way out the door. I have self-administered my punishment. This patch will fix it: Signed-off-by: Adam Litke <[EMAIL PROTECTED]> --- mm/hugetlb.c.orig 2008-03-04 13:36:30.0 -0800 +++ mm/hugetlb.c2008-03-04 13:39:30.0 -0800 @@ -291,8 +291,8 @@ static struct page *alloc_buddy_huge_pag * This page is now managed by the hugetlb allocator and has * no users -- drop the buddy allocator's reference. */ - int page_count = put_page_testzero(page); - BUG_ON(page_count != 0); + put_page_testzero(page); + VM_BUG_ON(page_count(page)); nid = page_to_nid(page); set_compound_page_dtor(page, free_huge_page); /* -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] properly reserve in bootmem the lmb reserved regions that cross numa nodes
This seems like the right approach to me. I have pointed out a few stylistic issues below. On Tue, 2008-09-30 at 09:53 -0500, Jon Tollefson wrote: > + /* Mark reserved regions */ > + for (i = 0; i < lmb.reserved.cnt; i++) { > + unsigned long physbase = lmb.reserved.region[i].base; > + unsigned long size = lmb.reserved.region[i].size; > + unsigned long start_pfn = physbase >> PAGE_SHIFT; > + unsigned long end_pfn = ((physbase+size-1) >> PAGE_SHIFT); CodingStyle dictates that this should be: unsigned long end_pfn = ((physbase + size - 1) >> PAGE_SHIFT); > +/** > + * get_node_active_region - Return active region containing start_pfn > + * @start_pfn The page to return the region for. > + * > + * It will return NULL if active region is not found. > + */ > +struct node_active_region *get_node_active_region( > + unsigned long start_pfn) Bad style. I think the convention would be to write it like this: struct node_active_region * get_node_active_region(unsigned long start_pfn) > +{ > + int i; > + for (i = 0; i < nr_nodemap_entries; i++) { > + unsigned long node_start_pfn = early_node_map[i].start_pfn; > + unsigned long node_end_pfn = early_node_map[i].end_pfn; > + > + if (node_start_pfn <= start_pfn && node_end_pfn > start_pfn) > + return &early_node_map[i]; > + } > + return NULL; > +} Since this is using the early_node_map[], should we mark the function __mminit? -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev