On Wed, 2007-10-03 at 08:47 -0700, Adam Litke wrote:
> When shrinking the size of the hugetlb pool via the nr_hugepages sysctl, we
> are careful to keep enough pages around to satisfy reservations.  But the
> calculation is flawed for the following scenario:
> 
> Action                          Pool Counters (Total, Free, Resv)
> ======                          =============
> Set pool to 1 page              1 1 0
> Map 1 page MAP_PRIVATE          1 1 0
> Touch the page to fault it in   1 0 0
> Set pool to 3 pages             3 2 0
> Map 2 pages MAP_SHARED          3 2 2
> Set pool to 2 pages             2 1 2 <-- Mistake, should be 3 2 2
> Touch the 2 shared pages        2 0 1 <-- Program crashes here
> 
> The last touch above will terminate the process due to lack of huge pages.
> 
> This patch corrects the calculation so that it factors in pages being used
> for private mappings.  Andrew, this is a standalone fix suitable for
> mainline.  It is also now corrected in my latest dynamic pool resizing
> patchset which I will send out soon.
> 
> Signed-off-by: Adam Litke <[EMAIL PROTECTED]>
> ---
> 
>  mm/hugetlb.c |    6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 84c795e..7af3908 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -224,14 +224,14 @@ static void try_to_free_low(unsigned long count)
>       for (i = 0; i < MAX_NUMNODES; ++i) {
>               struct page *page, *next;
>               list_for_each_entry_safe(page, next, &hugepage_freelists[i], 
> lru) {
> +                     if (count >= nr_huge_pages)
> +                             return;
>                       if (PageHighMem(page))
>                               continue;
>                       list_del(&page->lru);
>                       update_and_free_page(page);
>                       free_huge_pages--;
>                       free_huge_pages_node[page_to_nid(page)]--;
> -                     if (count >= nr_huge_pages)
> -                             return;
>               }
>       }
>  }

That's an excellent problem description.  I'm just a bit hazy on how the
patch fixes it. :)

What is the actual error in this loop?  The fact that we can go trying
to free pages when the count is actually OK?

BTW, try_to_free_low(count) kinda sucks for a function name.  Is that
count the number of pages we're trying to end up with, or the total
number of low pages that we're trying to free?

Also, as I look at try_to_free_low(), why do we need to #ifdef it out in
the case of !HIGHMEM?  If we have CONFIG_HIGHMEM=yes, we still might not
have any _actual_ high memory.  So, they loop obviously doesn't *hurt*
when there is no high memory.  

> @@ -251,7 +251,7 @@ static unsigned long set_max_huge_pages(unsigned long 
> count)
>               return nr_huge_pages;
> 
>       spin_lock(&hugetlb_lock);
> -     count = max(count, resv_huge_pages);
> +     count = max(count, resv_huge_pages + nr_huge_pages - free_huge_pages);
>       try_to_free_low(count);
>       while (count < nr_huge_pages) {
>               struct page *page = dequeue_huge_page(NULL, 0);

The real problem with this line is that "count" is too ambiguous. :)

We could rewrite the original max() line this way:

        if (resv_huge_pages > nr_of_pages_to_end_up_with)
                nr_of_pages_to_end_up_with = resv_huge_pages;
        try_to_make_the_total_nr_of_hpages(nr_of_pages_to_end_up_with);

Which makes it more clear that you're setting the number of total pages
to the number of reserved pages, which is obviously screwy.

OK, so this is actually saying: "count can never go below
resv_huge_pages+nr_huge_pages"?

Could we change try_to_free_low() to free a distinct number of pages?

        if (count > free_huge_pages)
                count = free_huge_pages;
        try_to_free_nr_huge_pages(count);

I feel a bit sketchy about the "resv_huge_pages + nr_huge_pages -
free_huge_pages" logic.  Could you elaborate a bit there on what the
rules are?

-- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to