On Wed, 2018-07-04 at 16:01 +0200, Thomas Gleixner wrote:
> On Wed, 4 Jul 2018, Yang, Bin wrote:
> > e820 table:
> > =================
> > 
> > [    0.000000] BIOS-e820: [mem 0x0000000000000000-
> > 0x000000000009fbff]
> > usable
> > [    0.000000] BIOS-e820: [mem 0x000000000009fc00-
> > 0x000000000009ffff]
> > reserved
> > [    0.000000] BIOS-e820: [mem 0x00000000000f0000-
> > 0x00000000000fffff]
> > reserved
> > [    0.000000] BIOS-e820: [mem 0x0000000000100000-
> > 0x00000000bffdffff]
> > usable
> > [    0.000000] BIOS-e820: [mem 0x00000000bffe0000-
> > 0x00000000bfffffff]
> > reserved
> > [    0.000000] BIOS-e820: [mem 0x00000000feffc000-
> > 0x00000000feffffff]
> > reserved
> > [    0.000000] BIOS-e820: [mem 0x00000000fffc0000-
> > 0x00000000ffffffff]
> > reserved
> > [    0.000000] BIOS-e820: [mem 0x0000000100000000-
> > 0x000000013fffffff]
> > usable
> > 
> > call chain:
> > ======================
> > 
> > ...
> > => free_init_pages(what="initrd" or "unused kernel",
> > begin=ffff9b26b....000, end=ffff9b26c....000); begin and end
> > addresses
> > are random. The begin/end value above is just for reference.
> > 
> > => set_memory_rw()
> > => change_page_attr_set()
> > => change_page_attr_set_clr()
> > => __change_page_attr_set_clr(); cpa->numpages is 512 on my board
> > if
> > what=="unused kernel"
> > => __change_page_attr()
> > => try_preserve_large_page(); address=ffff9b26bfacf000, pfn=80000,
> > level=3; and the check loop count is 262144, exit loop after 861
> > usecs
> > on my board
> 
> You are talking about the static_protections() check, right?
> 
> > the actual problem
> > ===================
> > sometimes, free_init_pages returns after hundreds of secounds. The
> > major impact is kernel boot time.
> 
> That's the symptom you are observing. The problem is in the
> try_to_preserve_large_page() logic.
> 
> The address range from the example above is:
> 
>    0xffff9b26b0000000 - 0xffff9b26c0000000
> 
> i.e. 256 MB.
> 
> So the first call with address 0xffff9b26b0000000 will try to
> preserve the
> 1GB page, but it's obvious that if pgrot changes that this has to
> split the
> 1GB page.
> 
> The current code is stupid in that regard simply because it does the
> static_protection() check loop _before_ checking:
> 
>   1) Whether the new and the old pgprot are the same
>   
>   2) Whether the address range which needs to be changed is aligned
> to and
>      covers the full large mapping
> 
> So it does the static_protections() loop before #1 and #2 and checks
> the
> full 1GB page wise, which makes it loop 262144 times.
> 
> With your magic 'cache' logic this will still loop exactly 262144
> times at
> least on the first invocation because there is no valid information
> in that
> 'cache'. So I really have no idea how your patch makes any difference
> unless it is breaking stuff left and right in very subtle ways.
> 
> If there is a second invocation with the same pgprot on that very
> same
> range, then I can see it somehow having that effect by chance, but
> not by
> design.
> 
> But this is all voodoo programming and there is a very obvious and
> simple
> solution for this:
> 
>   The check for pgprot_val(new_prot) == pgprot_val(old_port) can
> definitely
>   be done _before_ the check loop. The check loop is pointless in
> that
>   case, really. If there is a mismatch anywhere then it existed
> already and
>   instead of yelling loudly the checking would just discover it,
> enforce
>   the split and that would in the worst case preserve the old wrong
> mapping
>   for those pages.
> 
>   What we want there is a debug mechanism which catches such cases,
> but is
>   not effective on production kernels and runs once or twice during
> boot.
> 
>   The range check whether the address is aligned to the large page
> and
>   covers the full large page (1G or 2M) is also obvious to do
> _before_ that
>   check, because if the requested range does not fit and has a
> different
>   pgprot_val() then it will decide to split after the check anyway.
> 
>   The check loop itself is stupid as well. Instead of looping in 4K
> steps
>   the thing can be rewritten to check for overlapping ranges and then
> check
>   explicitely for those. If there is no overlap in the first place
> the
>   whole loop thing can be avoided, but that's a pure optimization and
> needs
>   more thought than the straight forward and obvious solution to the
>   problem at hand.
> 
> Untested patch just moving the quick checks to the obvious right
> place
> below.
> 
> Thanks,
> 
>       tglx
> 
> 8<-----------------
> --- a/arch/x86/mm/pageattr.c
> +++ b/arch/x86/mm/pageattr.c
> @@ -628,47 +628,61 @@ try_preserve_large_page(pte_t *kpte, uns
>       new_prot = static_protections(req_prot, address, pfn);
>  
>       /*
> -      * We need to check the full range, whether
> -      * static_protection() requires a different pgprot for one
> of
> -      * the pages in the range we try to preserve:
> +      * If there are no changes, return. cpa->numpages has been
> updated
> +      * above.
> +      *
> +      * There is no need to do any of the checks below because
> the
> +      * existing pgprot of the large mapping has to be correct.
> If it's
> +      * not then this is a bug in some other code and needs to be
> fixed
> +      * there and not silently papered over by the
> static_protections()
> +      * magic and then 'fixed' up in the wrong way.
>        */
> -     addr = address & pmask;
> -     pfn = old_pfn;
> -     for (i = 0; i < (psize >> PAGE_SHIFT); i++, addr +=
> PAGE_SIZE, pfn++) {
> -             pgprot_t chk_prot = static_protections(req_prot,
> addr, pfn);
> -
> -             if (pgprot_val(chk_prot) != pgprot_val(new_prot))
> -                     goto out_unlock;
> +     if (pgprot_val(new_prot) == pgprot_val(old_prot)) {
> +             do_split = 0;
> +             goto out_unlock;
>       }
>  
>       /*
> -      * If there are no changes, return. maxpages has been
> updated
> -      * above:
> +      * If the requested address range is not aligned to the
> start of
> +      * the large page or does not cover the full range, split it
> up.
> +      * No matter what the static_protections() check below does,
> it
> +      * would anyway result in a split after doing all the check
> work
> +      * for nothing.
>        */
> -     if (pgprot_val(new_prot) == pgprot_val(old_prot)) {
> -             do_split = 0;
> +     addr = address & pmask;
> +     numpages = psize >> PAGE_SHIFT;
> +     if (address != addr || cpa->numpages != numpages)
>               goto out_unlock;
> -     }
>  
>       /*
> -      * We need to change the attributes. Check, whether we can
> -      * change the large page in one go. We request a split, when
> -      * the address is not aligned and the number of pages is
> -      * smaller than the number of pages in the large page. Note
> -      * that we limited the number of possible pages already to
> -      * the number of pages in the large page.
> +      * Check the full range, whether static_protection()
> requires a
> +      * different pgprot for one of the pages in the existing
> large
> +      * mapping.
> +      *
> +      * FIXME:
> +      * 1) Make this yell loudly if something tries to set a full
> +      *    large page to something which is not allowed.

I am trying to work out a patch based on your sample code and
comments. 
I do not understand here why it needs to yell loudly if set a full
large page to something which is not allowed. It can split the large
page to 4K-pages finally. And static_protection() will also be called
for 4K-page change. Why not just add warning if 4K-page change is not
allowed?

> +      * 2) Add a check for catching a case where the existing
> mapping
> +      *    is wrong already.
> +      * 3) Instead of looping over the whole thing, find the
> overlapping
> +      *    ranges and check them explicitely instead of looping
> over a
> +      *    full 1G mapping in 4K steps (256k iterations) for
> figuring
> +      *    that out.
>        */
> -     if (address == (address & pmask) && cpa->numpages == (psize
> >> PAGE_SHIFT)) {
> -             /*
> -              * The address is aligned and the number of pages
> -              * covers the full page.
> -              */
> -             new_pte = pfn_pte(old_pfn, new_prot);
> -             __set_pmd_pte(kpte, address, new_pte);
> -             cpa->flags |= CPA_FLUSHTLB;
> -             do_split = 0;
> +     pfn = old_pfn;
> +     for (i = 0; i < numpages; i++, addr += PAGE_SIZE, pfn++) {
> +             pgprot_t chk_prot = static_protections(req_prot,
> addr, pfn);
> +
> +             if (pgprot_val(chk_prot) != pgprot_val(new_prot))
> +                     goto out_unlock;
>       }
>  
> +     /* All checks passed. Just change the large mapping entry */
> +     new_pte = pfn_pte(old_pfn, new_prot);
> +     __set_pmd_pte(kpte, address, new_pte);
> +     cpa->flags |= CPA_FLUSHTLB;
> +     do_split = 0;
> +
>  out_unlock:
>       spin_unlock(&pgd_lock);
>  
> 
> 
> 
> 

Reply via email to