On Tue, Aug 31, 2021 at 01:34:04PM +0100, Joao Martins wrote:
> On 8/30/21 2:07 PM, Jason Gunthorpe wrote:
> > On Fri, Aug 27, 2021 at 07:34:54PM +0100, Joao Martins wrote:
> >> On 8/27/21 5:25 PM, Jason Gunthorpe wrote:
> >>> On Fri, Aug 27, 2021 at 03:58:13PM +0100, Joao Martins wrote:
> >>>
> >>>>  #if defined(CONFIG_ARCH_HAS_PTE_DEVMAP) && 
> >>>> defined(CONFIG_TRANSPARENT_HUGEPAGE)
> >>>>  static int __gup_device_huge(unsigned long pfn, unsigned long addr,
> >>>>                               unsigned long end, unsigned int flags,
> >>>>                               struct page **pages, int *nr)
> >>>>  {
> >>>> -        int nr_start = *nr;
> >>>> +        int refs, nr_start = *nr;
> >>>>          struct dev_pagemap *pgmap = NULL;
> >>>>          int ret = 1;
> >>>>  
> >>>>          do {
> >>>> -                struct page *page = pfn_to_page(pfn);
> >>>> +                struct page *head, *page = pfn_to_page(pfn);
> >>>> +                unsigned long next = addr + PAGE_SIZE;
> >>>>  
> >>>>                  pgmap = get_dev_pagemap(pfn, pgmap);
> >>>>                  if (unlikely(!pgmap)) {
> >>>> @@ -2252,16 +2265,25 @@ static int __gup_device_huge(unsigned long pfn, 
> >>>> unsigned long addr,
> >>>>                          ret = 0;
> >>>>                          break;
> >>>>                  }
> >>>> -                SetPageReferenced(page);
> >>>> -                pages[*nr] = page;
> >>>> -                if (unlikely(!try_grab_page(page, flags))) {
> >>>> -                        undo_dev_pagemap(nr, nr_start, flags, pages);
> >>>> +
> >>>> +                head = compound_head(page);
> >>>> +                /* @end is assumed to be limited at most one compound 
> >>>> page */
> >>>> +                if (PageHead(head))
> >>>> +                        next = end;
> >>>> +                refs = record_subpages(page, addr, next, pages + *nr);
> >>>> +
> >>>> +                SetPageReferenced(head);
> >>>> +                if (unlikely(!try_grab_compound_head(head, refs, 
> >>>> flags))) {
> >>>> +                        if (PageHead(head))
> >>>> +                                ClearPageReferenced(head);
> >>>> +                        else
> >>>> +                                undo_dev_pagemap(nr, nr_start, flags, 
> >>>> pages);
> >>>>                          ret = 0;
> >>>>                          break;
> >>>
> >>> Why is this special cased for devmap?
> >>>
> >>> Shouldn't everything processing pud/pmds/etc use the same basic loop
> >>> that is similar in idea to the 'for_each_compound_head' scheme in
> >>> unpin_user_pages_dirty_lock()?
> >>>
> >>> Doesn't that work for all the special page type cases here?
> >>
> >> We are iterating over PFNs to create an array of base pages (regardless of 
> >> page table
> >> type), rather than iterating over an array of pages to work on. 
> > 
> > That is part of it, yes, but the slow bit here is to minimally find
> > the head pages and do the atomics on them, much like the
> > unpin_user_pages_dirty_lock()
> > 
> > I would think this should be designed similar to how things work on
> > the unpin side.
> > 
> I don't think it's the same thing. The bit you say 'minimally find the
> head pages' carries a visible overhead in unpin_user_pages() as we are
> checking each of the pages belongs to the same head page -- because you
> can pass an arbritary set of pages. This does have a cost which is not
> in gup-fast right now AIUI. Whereas in our gup-fast 'handler' you
> already know that you are processing a contiguous chunk of pages.
> If anything, we are closer to unpin_user_page_range*()
> than unpin_user_pages().

Yes, that is what I mean, it is very similar to the range case as we
don't even know that a single compound spans a pud/pmd. So you end up
doing the same loop to find the compound boundaries.

Under GUP slow we can also aggregate multiple page table entires, eg a
split huge page could be procesed as a single 2M range operation even
if it is broken to 4K PTEs.
> 
> Switching to similar iteration logic to unpin would look something like
> this (still untested):
> 
>         for_each_compound_range(index, &page, npages, head, refs) {
>                 pgmap = get_dev_pagemap(pfn + *nr, pgmap);

I recall talking to DanW about this and we agreed it was unnecessary
here to hold the pgmap and should be deleted.

Jason

Reply via email to