On Tue, 2011-07-19 at 13:17 +0800, Shan Hai wrote: > The patch works, but I have certain confusions, > - Do we want to handle_mm_fault on each futex_lock_pi > even though in most cases there is no write permission > fixup's needed?
Don't we only ever call this when futex_atomic_op_inuser() failed ? Which means a fixup -is- needed .... The fast path is still there. > - How about let the archs do their own write permission > fixup as what I did in my original Why ? This is generic and will fix all archs at once with generic code which is a significant improvement in my book and a lot more maintainable :-) > "[PATCH 1/1] Fixup write permission of TLB on powerpc e500 core"? > (I will fix the stupid errors in my original patch if the concept > is acceptable) > in this way we could decrease the overhead of handle_mm_fault > in the path which does not need write permission fixup. Which overhead ? gup does handle_mm_fault() as well if needed. What I do is I replace what is arguably an abuse of gup() in the case where a fixup -is- needed with a dedicated function designed to perform the said fixup ... and do it properly which gup() didn't :-) Cheers, Ben. > Thanks > Shan Hai > > Cheers, > > Ben. > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index 9670f71..1036614 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -985,6 +985,8 @@ int get_user_pages(struct task_struct *tsk, struct > > mm_struct *mm, > > int get_user_pages_fast(unsigned long start, int nr_pages, int write, > > struct page **pages); > > struct page *get_dump_page(unsigned long addr); > > +extern int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm, > > + unsigned long address, unsigned int fault_flags); > > > > extern int try_to_release_page(struct page * page, gfp_t gfp_mask); > > extern void do_invalidatepage(struct page *page, unsigned long offset); > > diff --git a/kernel/futex.c b/kernel/futex.c > > index fe28dc2..7a0a4ed 100644 > > --- a/kernel/futex.c > > +++ b/kernel/futex.c > > @@ -355,8 +355,8 @@ static int fault_in_user_writeable(u32 __user *uaddr) > > int ret; > > > > down_read(&mm->mmap_sem); > > - ret = get_user_pages(current, mm, (unsigned long)uaddr, > > - 1, 1, 0, NULL, NULL); > > + ret = fixup_user_fault(current, mm, (unsigned long)uaddr, > > + FAULT_FLAG_WRITE); > > up_read(&mm->mmap_sem); > > > > return ret< 0 ? ret : 0; > > diff --git a/mm/memory.c b/mm/memory.c > > index 40b7531..b967fb0 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -1815,7 +1815,64 @@ next_page: > > } > > EXPORT_SYMBOL(__get_user_pages); > > > > -/** > > +/* > > + * fixup_user_fault() - manually resolve a user page fault > > + * @tsk: the task_struct to use for page fault accounting, or > > + * NULL if faults are not to be recorded. > > + * @mm: mm_struct of target mm > > + * @address: user address > > + * @fault_flags:flags to pass down to handle_mm_fault() > > + * > > + * This is meant to be called in the specific scenario where for > > + * locking reasons we try to access user memory in atomic context > > + * (within a pagefault_disable() section), this returns -EFAULT, > > + * and we want to resolve the user fault before trying again. > > + * > > + * Typically this is meant to be used by the futex code. > > + * > > + * The main difference with get_user_pages() is that this function > > + * will unconditionally call handle_mm_fault() which will in turn > > + * perform all the necessary SW fixup of the dirty and young bits > > + * in the PTE, while handle_mm_fault() only guarantees to update > > + * these in the struct page. > > + * > > + * This is important for some architectures where those bits also > > + * gate the access permission to the page because their are > > + * maintained in software. On such architecture, gup() will not > > + * be enough to make a subsequent access succeed. > > + * > > + * This should be called with the mm_sem held for read. > > + */ > > +int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm, > > + unsigned long address, unsigned int fault_flags) > > +{ > > + struct vm_area_struct *vma; > > + int ret; > > + > > + vma = find_extend_vma(mm, address); > > + if (!vma || address< vma->vm_start) > > + return -EFAULT; > > + > > + ret = handle_mm_fault(mm, vma, address, fault_flags); > > + if (ret& VM_FAULT_ERROR) { > > + if (ret& VM_FAULT_OOM) > > + return -ENOMEM; > > + if (ret& (VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE)) > > + return -EHWPOISON; > > + if (ret& VM_FAULT_SIGBUS) > > + return -EFAULT; > > + BUG(); > > + } > > + if (tsk) { > > + if (ret& VM_FAULT_MAJOR) > > + tsk->maj_flt++; > > + else > > + tsk->min_flt++; > > + } > > + return 0; > > +} > > + > > +/* > > * get_user_pages() - pin user pages in memory > > * @tsk: the task_struct to use for page fault accounting, or > > * NULL if faults are not to be recorded. > > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev