riscv: Implement superpage splitting for p2m mappings

Jan Beulich Thu, 24 Jul 2025 00:58:26 -0700

On 23.07.2025 21:51, Oleksii Kurochko wrote:
> 
> On 7/22/25 6:02 PM, Jan Beulich wrote:
>> On 22.07.2025 16:57, Oleksii Kurochko wrote:
>>> On 7/21/25 3:34 PM, Jan Beulich wrote:
>>>> On 17.07.2025 18:37, Oleksii Kurochko wrote:
>>>>> On 7/2/25 11:25 AM, Jan Beulich wrote:
>>>>>> On 10.06.2025 15:05, Oleksii Kurochko wrote:
>>>>>>> Add support for down large memory mappings ("superpages") in the RISC-V
>>>>>>> p2m mapping so that smaller, more precise mappings ("finer-grained 
>>>>>>> entries")
>>>>>>> can be inserted into lower levels of the page table hierarchy.
>>>>>>>
>>>>>>> To implement that the following is done:
>>>>>>> - Introduce p2m_split_superpage(): Recursively shatters a superpage into
>>>>>>>      smaller page table entries down to the target level, preserving 
>>>>>>> original
>>>>>>>      permissions and attributes.
>>>>>>> - __p2m_set_entry() updated to invoke superpage splitting when inserting
>>>>>>>      entries at lower levels within a superpage-mapped region.
>>>>>>>
>>>>>>> This implementation is based on the ARM code, with modifications to the 
>>>>>>> part
>>>>>>> that follows the BBM (break-before-make) approach. Unlike ARM, RISC-V 
>>>>>>> does
>>>>>>> not require BBM, so there is no need to invalidate the PTE and flush the
>>>>>>> TLB before updating it with the newly created, split page table.
>>>>>> But some flushing is going to be necessary. As long as you only ever do
>>>>>> global flushes, the one after the individual PTE modification (within the
>>>>>> split table) will do (if BBM isn't required, see below), but once you 
>>>>>> move
>>>>>> to more fine-grained flushing, that's not going to be enough anymore. Not
>>>>>> sure it's a good idea to leave such a pitfall.
>>>>> I think that I don't fully understand what is an issue.
>>>> Whether a flush is necessary after solely breaking up a superpage is arch-
>>>> defined. I don't know how much restrictions the spec on possible hardware
>>>> behavior for RISC-V. However, the eventual change of (at least) one entry
>>>> of fulfill the original request will surely require a flush. What I was
>>>> trying to say is that this required flush would better not also cover for
>>>> the flushes that may or may not be required by the spec. IOW if the spec
>>>> leaves any room for flushes to possibly be needed, those flushes would
>>>> better be explicit.
>>> I think that I still don't understand why what I wrote above will work as 
>>> long
>>> as global flushes is working and will stop to work when more fine-grained 
>>> flushing
>>> is used.
>>>
>>> Inside p2m_split_superpage() we don't need any kind of TLB flush operation 
>>> as
>>> it is allocation a new page for a table and works with it, so no one could 
>>> use
>>> this page at the moment and cache it in TLB.
>>>
>>> The only question is that if it is needed BBM before staring using splitted 
>>> entry:
>>>           ....
>>>           if ( !p2m_split_superpage(p2m, &split_pte, level, target, 
>>> offsets) )
>>>           {
>>>               /* Free the allocated sub-tree */
>>>               p2m_free_subtree(p2m, split_pte, level);
>>>
>>>               rc = -ENOMEM;
>>>               goto out;
>>>           }
>>>
>>> ------> /* Should be BBM used here ? */
>>>           p2m_write_pte(entry, split_pte, p2m->clean_pte);
>>>
>>> And I can't find anything in the spec what tells me to use BBM here like Arm
>>> does:
>> But what you need is a statement in the spec that you can get away without. 
>> Imo
>> at least.
> 
> In the spec. it is mentioned that:
>    It is permitted for multiple address-translation cache entries to co-exist 
> for the same
>    address. This represents the fact that in a conventional TLB hierarchy, it 
> is possible for
>    multiple entries to match a single address if, for example, a page is 
> upgraded to a
>    superpage without first clearing the original non-leaf PTE’s valid bit and 
> executing an
>    SFENCE.VMA with rs1=x0, or if multiple TLBs exist in parallel at a given 
> level of the
>    hierarchy. In this case, just as if an SFENCE.VMA is not executed between 
> a write to the
>    memory-management tables and subsequent implicit read of the same address: 
> it is
>    unpredictable whether the old non-leaf PTE or the new leaf PTE is used, 
> but the behavior is
>    otherwise well defined.
> The phrase*"but the behavior is otherwise well defined"* emphasizes that even 
> if the TLB sees
> two versions (the old and the new), the architecture guarantees stability, 
> and the behavior
> remains safe — though unpredictable in terms of which translation will be 
> used.
> And I think that this unpredictability is okay, at least, in the case if 
> superpage splitting
> and therefore TLB flushing can be deferred since the old pages (which are 
> used for old mapping)
> still exist and the permissions of the new entries match those of the 
> original ones.
> Also, it seems like there clearing PTE before TLB flushing isn't need too.
> 
> Does it make sense?


Yes, I think this indeed is sufficient as a spec requirement.

Jan

Re: [PATCH v2 15/17] xen/riscv: Implement superpage splitting for p2m mappings

Reply via email to