On 8/11/25 5:44 PM, Jan Beulich wrote:
On 31.07.2025 17:58, Oleksii Kurochko wrote:
RISC-V's PTE has only two available bits that can be used to store the P2M
type. This is insufficient to represent all the current RISC-V P2M types.
Therefore, some P2M types must be stored outside the PTE bits.
To address this, a metadata table is introduced to store P2M types that
cannot fit in the PTE itself. Not all P2M types are stored in the
metadata table—only those that require it.
The metadata table is linked to the intermediate page table via the
`struct page_info`'s list field of the corresponding intermediate page.
To simplify the allocation and linking of intermediate and metadata page
tables, `p2m_{alloc,free}_table()` functions are implemented.
These changes impact `p2m_split_superpage()`, since when a superpage is
split, it is necessary to update the metadata table of the new
intermediate page table — if the entry being split has its P2M type set
to `p2m_ext_storage` in its `P2M_TYPES` bits.
Oh, this was an aspect I didn't realize when commenting on the name of
the enumerator. I think you want to keep the name for the purpose here,
but you better wouldn't apply relational operators to it (and hence
have a second value to serve that purpose).
It could be done in this way, but I think that it would be better just to have
one value with a better name as I suggested in the reply to other patch.
In addition to updating
the metadata of the new intermediate page table, the corresponding entry
in the metadata for the original superpage is invalidated.
Also, update p2m_{get,set}_type to work with P2M types which don't fit
into PTE bits.
Signed-off-by: Oleksii Kurochko<oleksii.kuroc...@gmail.com>
No Suggested-by: or anything?
Sorry for that, Suggested-by should be added here, I'll fix that in the
next patch series version.
--- a/xen/arch/riscv/include/asm/mm.h
+++ b/xen/arch/riscv/include/asm/mm.h
@@ -150,6 +150,15 @@ struct page_info
/* Order-size of the free chunk this page is the head of. */
unsigned int order;
} free;
+
+ /* Page is used to store metadata: p2m type. */
That's not correct. The page thus described is what the pointer below
points to. Here it's more like "Page is used as an intermediate P2M
page table".
+ struct {
+ /*
+ * Pointer to a page which store metadata for an intermediate page
+ * table.
+ */
+ struct page_info *metadata;
+ } md;
In the description you say you would re-use the list field.
It was so in a first version of storing P2M type outside PTE bits, so, it is a
stale part of the commit message. I'll correct it.
--- a/xen/arch/riscv/p2m.c
+++ b/xen/arch/riscv/p2m.c
@@ -101,7 +101,16 @@ static int p2m_alloc_root_table(struct p2m_domain *p2m)
{
struct domain *d = p2m->domain;
struct page_info *page;
- const unsigned int nr_root_pages = P2M_ROOT_PAGES;
+ /*
+ * If the root page table starts at Level <= 2, and since only 1GB, 2MB,
+ * and 4KB mappings are supported (as enforced by the ASSERT() in
+ * p2m_set_entry()), it is necessary to allocate P2M_ROOT_PAGES for
+ * the root page table itself, plus an additional P2M_ROOT_PAGES for
+ * metadata storage. This is because only two free bits are available in
+ * the PTE, which are not sufficient to represent all possible P2M types.
+ */
+ const unsigned int nr_root_pages = P2M_ROOT_PAGES *
+ ((P2M_ROOT_LEVEL <= 2) ? 2 : 1);
/*
* Return back nr_root_pages to assure the root table memory is also
@@ -114,6 +123,23 @@ static int p2m_alloc_root_table(struct p2m_domain *p2m)
if ( !page )
return -ENOMEM;
+ if ( P2M_ROOT_LEVEL <= 2 )
+ {
+ /*
+ * In the case where P2M_ROOT_LEVEL <= 2, it is necessary to allocate
+ * a page of the same size as that used for the root page table.
+ * Therefore, p2m_allocate_root() can be safely reused.
+ */
+ struct page_info *metadata = p2m_allocate_root(d);
+ if ( !metadata )
+ {
+ free_domheap_pages(page, P2M_ROOT_ORDER);
+ return -ENOMEM;
+ }
+
+ page->v.md.metadata = metadata;
Don't you need to install such a link for every one of the 4 pages?
Yes, I need to do that. Thanks.
@@ -198,24 +224,25 @@ static pte_t *p2m_get_root_pointer(struct p2m_domain
*p2m, gfn_t gfn)
return __map_domain_page(p2m->root + root_table_indx);
}
-static int p2m_set_type(pte_t *pte, p2m_type_t t)
+static void p2m_set_type(pte_t *pte, const p2m_type_t t, const unsigned int i)
{
- int rc = 0;
-
if ( t > p2m_ext_storage )
- panic("unimplemeted\n");
+ {
+ ASSERT(pte);
+
+ pte[i].pte = t;
What does i identify here?
An index in metadata page where P2M type for corresponding PTE is stored.
I will re-name it to metadata_indx for more clarity.
+ }
else
pte->pte |= MASK_INSR(t, P2M_TYPE_PTE_BITS_MASK);
-
- return rc;
}
-static p2m_type_t p2m_get_type(const pte_t pte)
+static p2m_type_t p2m_get_type(const pte_t pte, const pte_t *metadata,
+ const unsigned int i)
{
p2m_type_t type = MASK_EXTR(pte.pte, P2M_TYPE_PTE_BITS_MASK);
if ( type == p2m_ext_storage )
- panic("unimplemented\n");
+ type = metadata[i].pte;
return type;
}
Overall this feels pretty fragile, as the caller has to pass several values
which all need to be in sync with one another. If you ...
Generally, agree it is fragile enough.
@@ -265,7 +292,10 @@ static void p2m_set_permission(pte_t *e, p2m_type_t t)
}
}
-static pte_t p2m_pte_from_mfn(mfn_t mfn, p2m_type_t t, bool is_table)
+static pte_t p2m_pte_from_mfn(mfn_t mfn, p2m_type_t t,
+ struct page_info *metadata_pg,
+ const unsigned int indx,
+ bool is_table)
{
pte_t e = (pte_t) { PTE_VALID };
@@ -285,12 +315,21 @@ static pte_t p2m_pte_from_mfn(mfn_t mfn, p2m_type_t t, bool is_table)
if ( !is_table )
{
+ pte_t *metadata = __map_domain_page(metadata_pg);
... map the page anyway, no matter whether ...
p2m_set_permission(&e, t);
+ metadata[indx].pte = p2m_invalid;
+
if ( t < p2m_ext_storage )
- p2m_set_type(&e, t);
+ p2m_set_type(&e, t, indx);
else
- panic("unimplemeted\n");
+ {
+ e.pte |= MASK_INSR(p2m_ext_storage, P2M_TYPE_PTE_BITS_MASK);
+ p2m_set_type(metadata, t, indx);
+ }
... you'll actually use it, maybe best to map both pages at the same point?
Only one page is mapped here (?) and it should be mapped here, I suppose, it
could be a case
when a previous set type is overwritten, so, it could be needed to invalidate a
type written
in metadata.
And as said elsewhere, no, I don't think you want to use p2m_set_type() for
two entirely different purposes.
I wasn't very happy too, but, at the same time I didn't want to have a
prototype where
it isn't really clear when it is needed to pass metadata and where it is not.
But considering
your comments then this one solution isn't good too. So maybe it would be
better just have
two separate functions: p2m_set_pte_type() and p2m_set_metadata_type().
@@ -323,22 +364,71 @@ static struct page_info *p2m_alloc_page(struct p2m_domain
*p2m)
return pg;
}
+static void p2m_free_page(struct p2m_domain *p2m, struct page_info *pg);
+
+/*
+ * Allocate a page table with an additional extra page to store
+ * metadata for each entry of the page table.
+ * Link this metadata page to page table page's list field.
+ */
+static struct page_info * p2m_alloc_table(struct p2m_domain *p2m)
Nit: Stray blank after * again.
+{
+ enum table_type
+ {
+ INTERMEDIATE_TABLE=0,
If you really think you need the "= 0", then please with blanks around '='.
+ /*
+ * At the moment, metadata is going to store P2M type
+ * for each PTE of page table.
+ */
+ METADATA_TABLE,
+ TABLE_MAX
+ };
+
+ struct page_info *tables[TABLE_MAX];
+
+ for ( unsigned int i = 0; i < TABLE_MAX; i++ )
+ {
+ tables[i] = p2m_alloc_page(p2m);
+
+ if ( !tables[i] )
+ goto out;
+
+ clear_and_clean_page(tables[i]);
+ }
+
+ tables[INTERMEDIATE_TABLE]->v.md.metadata = tables[METADATA_TABLE];
+
+ return tables[INTERMEDIATE_TABLE];
+
+ out:
+ for ( unsigned int i = 0; i < TABLE_MAX; i++ )
+ if ( tables[i] )
You didn't clear all of tables[] first, though.
Oh, right, i missed an initalizer for tables[] array.
This kind of cleanup is
often better done as
while ( i-- > 0 )
...
You don't even need an if() then, as you know allocations succeeded for all
earlier array slots.
Yes, it looks very nice.
+ p2m_free_page(p2m, tables[i]);
+
+ return NULL;
+}
I'm also surprised you allocate the metadata table no matter whether you'll
actually need it. That'll double your average paging pool usage, when in a
typical case only very few entries would actually require this extra
storage.
Nice point, we could really do a delayed allocation instead and allocate only
when requested P2M type is > p2m_ext_storage.
I'll implement that.
@@ -453,10 +543,9 @@ static void p2m_put_2m_superpage(mfn_t mfn, p2m_type_t
type)
}
/* Put any references on the page referenced by pte. */
-static void p2m_put_page(const pte_t pte, unsigned int level)
+static void p2m_put_page(const pte_t pte, unsigned int level, p2m_type_t p2mt)
{
mfn_t mfn = pte_get_mfn(pte);
- p2m_type_t p2m_type = p2m_get_type(pte);
ASSERT(pte_is_valid(pte));
@@ -470,10 +559,10 @@ static void p2m_put_page(const pte_t pte, unsigned int level)
switch ( level )
{
case 1:
- return p2m_put_2m_superpage(mfn, p2m_type);
+ return p2m_put_2m_superpage(mfn, p2mt);
case 0:
- return p2m_put_4k_page(mfn, p2m_type);
+ return p2m_put_4k_page(mfn, p2mt);
}
}
Might it be better to introduce this function in this shape right away, in
the earlier patch?
Agree, probably, I did that intentionally, but I don't remember why. I will try
to
avoid these changes in this patch as it looks unnecessary here.
@@ -690,18 +791,23 @@ static int p2m_set_entry(struct p2m_domain *p2m,
{
/* We need to split the original page. */
pte_t split_pte = *entry;
+ struct page_info *metadata = virt_to_page(table)->v.md.metadata;
This (or along these lines) is how I would have expected things to be done
elsewhere as well, limiting the amount of arguments you need to pass
around.
I will try to re-use this approach elsewhere I can.
Thanks.
~ Oleksii