Paul Mackerras <pau...@samba.org> writes:

I will reply to the other parts in a seperate email, but the below

>> +static void page_table_free_rcu(struct mmu_gather *tlb, unsigned long 
>> *table)
>> +{
>> +    struct page *page;
>> +    struct mm_struct *mm;
>> +    unsigned int bit, mask;
>> +
>> +    mm = tlb->mm;
>> +    /* Free 2K page table fragment of a 64K page */
>> +    page = virt_to_page(table);
>> +    bit = 1 << ((__pa(table) & ~PAGE_MASK) / PTE_FRAG_SIZE);
>> +    spin_lock(&mm->page_table_lock);
>> +    /*
>> +     * stash the actual mask in higher half, and clear the lower half
>> +     * and selectively, add remove from pgtable list
>> +     */
>> +    mask = atomic_xor_bits(&page->_mapcount, bit | (bit << FRAG_MASK_BITS));
>> +    if (!(mask & FRAG_MASK))
>> +            list_del(&page->lru);
>> +    else {
>> +            /*
>> +             * Add the page table page to pgtable_list so that
>> +             * the free fragment can be used by the next alloc
>> +             */
>> +            list_del_init(&page->lru);
>> +            list_add_tail(&page->lru, &mm->context.pgtable_list);
>> +    }
>> +    spin_unlock(&mm->page_table_lock);
>> +    tlb_remove_table(tlb, table);
>> +}
>
> This looks like you're allowing a fragment that is being freed to be
> reallocated and used again during the grace period when we are waiting
> for any references to the fragment to disappear.  Doesn't that allow a
> race where one CPU traversing the page table and using the fragment in
> its old location in the tree could see a PTE created after the
> fragment was reallocated?  In other words, why is it safe to allow the
> fragment to be used during the grace period?  If it is safe, it at
> least needs a comment explaining why.
>

We don't allow it to be reallocated during the grace period. The trick
is in the below lines of page_table_alloc()

                /*
                 * Update with the higher order mask bits accumulated,
                 * added as a part of rcu free.
                 */
                mask = mask | (mask >> FRAG_MASK_BITS);

When checking for mask, we also look at the higher order bits.

The reason we add the page back to &mm->context.pgtable_list in
page_table_free_rcu is because we need to have access to struct
mm_struct. We don't have that in the rcu call back. So we add early and
make sure we don't reallocate them, until the grace period is over.

I will definitely add more comments around the code to clarify these details.

-aneesh

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Reply via email to