riscv: Implement superpage splitting for p2m mappings

Oleksii Kurochko Wed, 23 Jul 2025 12:51:29 -0700


On 7/22/25 6:02 PM, Jan Beulich wrote:

On 22.07.2025 16:57, Oleksii Kurochko wrote:

On 7/21/25 3:34 PM, Jan Beulich wrote:

On 17.07.2025 18:37, Oleksii Kurochko wrote:

On 7/2/25 11:25 AM, Jan Beulich wrote:

On 10.06.2025 15:05, Oleksii Kurochko wrote:

Add support for down large memory mappings ("superpages") in the RISC-V
p2m mapping so that smaller, more precise mappings ("finer-grained entries")
can be inserted into lower levels of the page table hierarchy.


To implement that the following is done:
- Introduce p2m_split_superpage(): Recursively shatters a superpage into
     smaller page table entries down to the target level, preserving original
     permissions and attributes.
- __p2m_set_entry() updated to invoke superpage splitting when inserting
     entries at lower levels within a superpage-mapped region.

This implementation is based on the ARM code, with modifications to the part
that follows the BBM (break-before-make) approach. Unlike ARM, RISC-V does
not require BBM, so there is no need to invalidate the PTE and flush the
TLB before updating it with the newly created, split page table.

But some flushing is going to be necessary. As long as you only ever do
global flushes, the one after the individual PTE modification (within the
split table) will do (if BBM isn't required, see below), but once you move
to more fine-grained flushing, that's not going to be enough anymore. Not
sure it's a good idea to leave such a pitfall.

I think that I don't fully understand what is an issue.

Whether a flush is necessary after solely breaking up a superpage is arch-
defined. I don't know how much restrictions the spec on possible hardware
behavior for RISC-V. However, the eventual change of (at least) one entry
of fulfill the original request will surely require a flush. What I was
trying to say is that this required flush would better not also cover for
the flushes that may or may not be required by the spec. IOW if the spec
leaves any room for flushes to possibly be needed, those flushes would
better be explicit.

I think that I still don't understand why what I wrote above will work as long
as global flushes is working and will stop to work when more fine-grained 
flushing
is used.

Inside p2m_split_superpage() we don't need any kind of TLB flush operation as
it is allocation a new page for a table and works with it, so no one could use
this page at the moment and cache it in TLB.

The only question is that if it is needed BBM before staring using splitted 
entry:
          ....
          if ( !p2m_split_superpage(p2m, &split_pte, level, target, offsets) )
          {
              /* Free the allocated sub-tree */
              p2m_free_subtree(p2m, split_pte, level);

              rc = -ENOMEM;
              goto out;
          }

------> /* Should be BBM used here ? */
          p2m_write_pte(entry, split_pte, p2m->clean_pte);

And I can't find anything in the spec what tells me to use BBM here like Arm
does:

But what you need is a statement in the spec that you can get away without. Imo
at least.


In the spec. it is mentioned that:
  It is permitted for multiple address-translation cache entries to co-exist 
for the same
  address. This represents the fact that in a conventional TLB hierarchy, it is 
possible for
  multiple entries to match a single address if, for example, a page is 
upgraded to a
  superpage without first clearing the original non-leaf PTE’s valid bit and 
executing an
  SFENCE.VMA with rs1=x0, or if multiple TLBs exist in parallel at a given 
level of the
  hierarchy. In this case, just as if an SFENCE.VMA is not executed between a 
write to the
  memory-management tables and subsequent implicit read of the same address: it 
is
  unpredictable whether the old non-leaf PTE or the new leaf PTE is used, but 
the behavior is
  otherwise well defined.
The phrase*"but the behavior is otherwise well defined"* emphasizes that even 
if the TLB sees
two versions (the old and the new), the architecture guarantees stability, and 
the behavior
remains safe — though unpredictable in terms of which translation will be used.
And I think that this unpredictability is okay, at least, in the case if 
superpage splitting
and therefore TLB flushing can be deferred since the old pages (which are used 
for old mapping)
still exist and the permissions of the new entries match those of the original 
ones.
Also, it seems like there clearing PTE before TLB flushing isn't need too.

Does it make sense?

Re: [PATCH v2 15/17] xen/riscv: Implement superpage splitting for p2m mappings

Reply via email to