Paul Mackerras <pau...@samba.org> writes:

> On Thu, Feb 21, 2013 at 10:17:12PM +0530, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.ku...@linux.vnet.ibm.com>
>> 
>> We now have PTE page consuming only 2K of the 64K page.This is in order to
>
> In fact the PTE page together with the hash table indexes occupies 4k,
> doesn't it?  The comments in the code are similarly confusing since
> they talk about 2k but actually allocate 4k.
>
>> facilitate transparent huge page support, which works much better if our PMDs
>> cover 16MB instead of 256MB.
>> 
>> Inorder to reduce the wastage, we now have multiple PTE page fragment
>   ^ In order (two words)
>
>> from the same PTE page.
>
> A patch like this needs a more complete description and explanation
> than you have given.  For instance, you could mention that the code
> that you're adding for the 32-bit and non-64k cases are just copies of
> the previously generic code from pgalloc.h (actually, this movement
> might be something that could be split out as a separate patch).
> Also, you should describe in outline how you keep a list of pages that
> aren't fully allocated and have a bitmap of which 4k sections are in
> use, and also how your scheme interacts with RCU.

will do

>
> [snip]
>
>> +#ifdef CONFIG_PPC_64K_PAGES
>> +/*
>> + * we support 15 fragments per PTE page. This is limited by how many
>
> Why only 15?  Don't we get 16 fragments per page?
>

That was one of the details I wanted to come back and closely look at
before posting. But missed that in the excitement of getting this
all working :). ._mapcount is a signed value and hence I was not sure
whether setting the top bit have any impact on how we deal with the page
in other part of VM. 


>> + * bits we can pack in page->_mapcount. We use the first half for
>> + * tracking the usage for rcu page table free.
>
> What does "first" mean?  The high half or the low half?
>

high half.

>> +unsigned long *page_table_alloc(struct mm_struct *mm, unsigned long vmaddr)
>> +{
>> +    struct page *page;
>> +    unsigned int mask, bit;
>> +    unsigned long *table;
>> +
>> +    /* Allocate fragments of a 4K page as 1K/2K page table */
>
> A 4k page?  Do you mean a 64k page?  And what is 1K to do with
> anything?
>

That should be completely dropped, Cut-paste from s390 code :)

>> +#ifdef CONFIG_SMP
>> +static void __page_table_free_rcu(void *table)
>> +{
>> +    unsigned int bit;
>> +    struct page *page;
>> +    /*
>> +     * this is a PTE page free 2K page table
>> +     * fragment of a 64K page.
>> +     */
>> +    page = virt_to_page(table);
>> +    bit = 1 << ((__pa(table) & ~PAGE_MASK) / PTE_FRAG_SIZE);
>> +    bit <<= FRAG_MASK_BITS;
>> +    /*
>> +     * clear the higher half and if nobody used the page in
>> +     * between, even lower half would be zero.
>> +     */
>> +    if (atomic_xor_bits(&page->_mapcount, bit) == 0) {
>> +            pgtable_page_dtor(page);
>> +            atomic_set(&page->_mapcount, -1);
>> +            __free_page(page);
>> +    }
>> +}
>> +
>> +static void page_table_free_rcu(struct mmu_gather *tlb, unsigned long 
>> *table)
>> +{
>> +    struct page *page;
>> +    struct mm_struct *mm;
>> +    unsigned int bit, mask;
>> +
>> +    mm = tlb->mm;
>> +    /* Free 2K page table fragment of a 64K page */
>> +    page = virt_to_page(table);
>> +    bit = 1 << ((__pa(table) & ~PAGE_MASK) / PTE_FRAG_SIZE);
>> +    spin_lock(&mm->page_table_lock);
>> +    /*
>> +     * stash the actual mask in higher half, and clear the lower half
>> +     * and selectively, add remove from pgtable list
>> +     */
>> +    mask = atomic_xor_bits(&page->_mapcount, bit | (bit << FRAG_MASK_BITS));
>> +    if (!(mask & FRAG_MASK))
>> +            list_del(&page->lru);
>> +    else {
>> +            /*
>> +             * Add the page table page to pgtable_list so that
>> +             * the free fragment can be used by the next alloc
>> +             */
>> +            list_del_init(&page->lru);
>> +            list_add_tail(&page->lru, &mm->context.pgtable_list);
>> +    }
>> +    spin_unlock(&mm->page_table_lock);
>> +    tlb_remove_table(tlb, table);
>> +}

-aneesh

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Reply via email to