On Wed, Aug 19, 2015 at 12:21:45PM +0300, Kirill A. Shutemov wrote:
> Hugh has pointed that compound_head() call can be unsafe in some
> context. There's one example:
> 
>       CPU0                                    CPU1
> 
> isolate_migratepages_block()
>   page_count()
>     compound_head()
>       !!PageTail() == true
>                                       put_page()
>                                         tail->first_page = NULL
>       head = tail->first_page
>                                       alloc_pages(__GFP_COMP)
>                                          prep_compound_page()
>                                            tail->first_page = head
>                                            __SetPageTail(p);
>       !!PageTail() == true
>     <head == NULL dereferencing>
> 
> The race is pure theoretical. I don't it's possible to trigger it in
> practice. But who knows.
> 
> We can fix the race by changing how encode PageTail() and compound_head()
> within struct page to be able to update them in one shot.
> 
> The patch introduces page->compound_head into third double word block in
> front of compound_dtor and compound_order. That means it shares storage
> space with:
> 
>  - page->lru.next;
>  - page->next;
>  - page->rcu_head.next;
>  - page->pmd_huge_pte;
> 
> That's too long list to be absolutely sure, but looks like nobody uses
> bit 0 of the word. It can be used to encode PageTail(). And if the bit
> set, rest of the word is pointer to head page.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com>
> Acked-by: Michal Hocko <mho...@suse.com>
> Cc: Hugh Dickins <hu...@google.com>
> Cc: David Rientjes <rient...@google.com>
> Cc: Vlastimil Babka <vba...@suse.cz>

If DEFERRED_STRUCT_PAGE_INIT=n, combining this patchset with my page-flags
patches causes oops in SetPageReserved() called from
reserve_bootmem_region().

It happens because we haven't yet initilized the word in struct page and
PageTail() inside SetPageReserved() can give false-positive, which leads
to bogus compound_head() result.

IIUC, we initialize the word only on first allocation of the page. It can
be too late: pfn scanner can see false-positive PageTail() from not yet
allocated pages too.

Here's fixlet for patch to address the issue.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 347724850665..d0e3fca830f8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -892,6 +892,8 @@ static void init_reserved_page(unsigned long pfn)
 #else
 static inline void init_reserved_page(unsigned long pfn)
 {
+       /* Avoid false-positive PageTail() */
+       INIT_LIST_HEAD(&pfn_to_page(pfn)->lru);
 }
 #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
 
-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to