riscv: add root page table allocation

Jan Beulich Mon, 30 Jun 2025 23:37:56 -0700

On 30.06.2025 18:18, Oleksii Kurochko wrote:
> On 6/30/25 5:22 PM, Jan Beulich wrote:
>> On 10.06.2025 15:05, Oleksii Kurochko wrote:
>>> --- a/xen/arch/riscv/include/asm/p2m.h
>>> +++ b/xen/arch/riscv/include/asm/p2m.h
>>> @@ -26,6 +26,12 @@ struct p2m_domain {
>>>       /* Pages used to construct the p2m */
>>>       struct page_list_head pages;
>>>   
>>> +    /* The root of the p2m tree. May be concatenated */
>>> +    struct page_info *root;
>>> +
>>> +    /* Address Translation Table for the p2m */
>>> +    paddr_t hgatp;
>> Does this really need holding in a struct field? Can't is be re-created at
>> any time from "root" above?
> 
> Yes, with the current one implementation, I agree it would be enough only
> root. But as you noticed below...
> 
>> And such re-creation is apparently infrequent,
>> if happening at all after initial allocation. (But of course I don't know
>> what future patches of yours will bring.) This is even more so if ...
>>
>>> --- a/xen/arch/riscv/include/asm/riscv_encoding.h
>>> +++ b/xen/arch/riscv/include/asm/riscv_encoding.h
>>> @@ -133,11 +133,13 @@
>>>   #define HGATP_MODE_SV48X4         _UL(9)
>>>   
>>>   #define HGATP32_MODE_SHIFT                31
>>> +#define HGATP32_MODE_MASK          _UL(0x80000000)
>>>   #define HGATP32_VMID_SHIFT                22
>>>   #define HGATP32_VMID_MASK         _UL(0x1FC00000)
>>>   #define HGATP32_PPN                       _UL(0x003FFFFF)
>>>   
>>>   #define HGATP64_MODE_SHIFT                60
>>> +#define HGATP64_MODE_MASK          _ULL(0xF000000000000000)
>>>   #define HGATP64_VMID_SHIFT                44
>>>   #define HGATP64_VMID_MASK         _ULL(0x03FFF00000000000)
>> ... VMID management is going to change as previously discussed, at which
>> point the value to put in hgatp will need (partly) re-calculating at certain
>> points anyway.
> 
> ... after VMID management will changed to per-CPU base then it will be needed
> to update re-calculate hgatp each time vCPU on pCPU is changed.
> In this case I prefer to have partially calculated 'hgatp'.


But why, when you need to do some recalculation anyway?

>>> --- a/xen/arch/riscv/p2m.c
>>> +++ b/xen/arch/riscv/p2m.c
>>> @@ -41,6 +41,91 @@ void p2m_write_unlock(struct p2m_domain *p2m)
>>>       write_unlock(&p2m->lock);
>>>   }
>>>   
>>> +static void clear_and_clean_page(struct page_info *page)
>>> +{
>>> +    clean_dcache_va_range(page, PAGE_SIZE);
>>> +    clear_domain_page(page_to_mfn(page));
>>> +}
>> A function of this name can, imo, only clear and then clean. Question is why
>> it's the other way around, and what the underlying requirement is for the
>> cleaning part to be there in the first place. Maybe that's obvious for a
>> RISC-V person, but it's entirely non-obvious to me (Arm being different in
>> this regard because of running with caches disabled at certain points in
>> time).
> 
> You're right, the current name|clear_and_clean_page()| implies that clearing
> should come before cleaning, which contradicts the current implementation.
> The intent here is to ensure that the page contents are consistent in RAM
> (not just in cache) before use by other entities (guests or devices).
> 
> The clean must follow the clear — so yes, the order needs to be reversed.

What you don't address though - why's the cleaning needed in the first place?

>>> +static struct page_info *p2m_allocate_root(struct domain *d)
>>> +{
>>> +    struct page_info *page;
>>> +    unsigned int order = get_order_from_bytes(KB(16));
>> While better than a hard-coded order of 2, this still is lacking. Is there
>> a reason there can't be a suitable manifest constant in the header?
> 
> No any specific reason, I just decided not to introduce new definition as
> it is going to be used only inside this function.
> 
> I think it will make sense to have in p2m.c:
>   #define P2M_ROOT_PT_SIZE KB(16)
> If it isn't the best one option, then what about to move this defintion
> to config.h or asm/p2m.h.

It's defined by the hardware, so neither of the two headers looks to be a
good fit. Nor is the P2M_ prefix really in line with this being hardware-
defined. page.h has various paging-related hw definitions, and
riscv_encoding.h may also be a suitable place. There may be other candidates
that I'm presently overlooking.

>>> +    unsigned int nr_pages = _AC(1,U) << order;
>> Nit (style): Missing blank after comma.
> 
> I've changed that to BIT(order, U)
> 
>>
>>> +    /* Return back nr_pages necessary for p2m root table. */
>>> +
>>> +    if ( ACCESS_ONCE(d->arch.paging.p2m_total_pages) < nr_pages )
>>> +        panic("Specify more xen,domain-p2m-mem-mb\n");
>> You shouldn't panic() in anything involved in domain creation. You want to
>> return NULL in this case.
> 
> It makes sense in this case just to return NULL.
> 
>>
>> Further, to me the use of "more" looks misleading here. Do you perhaps mean
>> "larger" or "bigger"?
>>
>> This also looks to be happening without any lock held. If that's intentional,
>> I think the "why" wants clarifying in a code comment.
> 
> Agree, returning back pages necessary for p2m root table should be done under
> spin_lock(&d->arch.paging.lock).

Which should be acquired at the paging_*() layer then, not at the p2m_*() layer.
(As long as you mean to have that separation, that is. See the earlier 
discussion
on that matter.)

>>> +    for ( unsigned int i = 0; i < nr_pages; i++ )
>>> +    {
>>> +        /* Return memory to domheap. */
>>> +        page = page_list_remove_head(&d->arch.paging.p2m_freelist);
>>> +        if( page )
>>> +        {
>>> +            ACCESS_ONCE(d->arch.paging.p2m_total_pages)--;
>>> +            free_domheap_page(page);
>>> +        }
>>> +        else
>>> +        {
>>> +            printk(XENLOG_ERR
>>> +                   "Failed to free P2M pages, P2M freelist is empty.\n");
>>> +            return NULL;
>>> +        }
>>> +    }
>> The reason for doing this may also want to be put in a comment.
> 
> I thought it would be enough the comment above: /* Return back nr_pages 
> necessary for p2m root table. */

That describes what the code does, but not why.

>>> +{
>>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>> +
>>> +    p2m->root = p2m_allocate_root(d);
>>> +    if ( !p2m->root )
>>> +        return -ENOMEM;
>>> +
>>> +    p2m->hgatp = hgatp_from_page(p2m);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>>   static spinlock_t vmid_alloc_lock = SPIN_LOCK_UNLOCKED;
>>>   
>>>   /*
>>> @@ -228,5 +313,14 @@ int p2m_set_allocation(struct domain *d, unsigned long 
>>> pages, bool *preempted)
>>>           }
>>>       }
>>>   
>>> +    /*
>>> +    * First, wait for the p2m pool to be initialized. Then allocate the 
>>> root
>> Why "wait"? There's waiting here.
> 
> I am not really get your question.
> 
> "wait" here is about the initialization of the pool which happens above this 
> comment.

But there's no "waiting" involved. What you talk about is one thing needing to
happen after the other.

Jan

Re: [PATCH v2 06/17] xen/riscv: add root page table allocation

Reply via email to