On 12/11/25 10:39 AM, Jan Beulich wrote:
On 10.12.2025 13:44, Oleksii Kurochko wrote:
On 12/10/25 8:06 AM, Jan Beulich wrote:
On 09.12.2025 18:09, Oleksii Kurochko wrote:
On 12/9/25 2:47 PM, Jan Beulich wrote:
On 24.11.2025 13:33, Oleksii Kurochko wrote:
+ *md_pg = p2m_alloc_page(p2m);
+ if ( !*md_pg )
+ {
+ printk("%pd: can't allocate metadata page\n", p2m->domain);
+ domain_crash(p2m->domain);
+
+ return;
+ }
+ }
+ }
+
+ if ( *md_pg )
+ metadata = __map_domain_page(*md_pg);
+
+ if ( t >= p2m_first_external )
+ {
+ metadata[ctx->index].type = t;
+
+ t = p2m_ext_storage;
+ }
+ else if ( metadata )
+ metadata[ctx->index].type = p2m_invalid;
+
+ pte->pte |= MASK_INSR(t, P2M_TYPE_PTE_BITS_MASK);
+
+ unmap_domain_page(metadata);
}
Just to mention (towards future work): Once a metadata page goes back to be
entirely zero-filled, it could as well be hooked off and returned to the pool.
Not doing so may mean detaining an unused page indefinitely.
Won’t that already happen when p2m_free_table() is called?
Well, that's when both page table and metadata table are freed. But what if a
leaf page table is moving back to holding all p2m_ram_rw mappings? Then the
metadata page is unused, but will remain allocated.
Good point...
This could be a rather expensive operation, since in the code:
+ else if ( metadata )
+ metadata[ctx->index].type = p2m_invalid;
we would have to check all other metadata entries to determine whether they are
(p2m_invalid) or not, and return the page to the pool.
It would be nice to have something like metadata.used_entries_num, but the
entire
page is used for type entries.
As an option, we could reserve 8 bits to store a counter of the number of used
entries in the metadata page, and then use metadata[0].used_entries_num to check
whether it is zero. If it is zero, we could simply return the metadata page to
the
pool in the “else if (metadata)” case mentioned above.
How bad is this idea? Any better suggestions?
First, as said in my initial reply: This may not need taking care of right away.
It will need keeping in mind, of course.
I am thinking if it won't be too intrusive, I think that I am okay to introduce
that
now.
As to suggestions - hardly any of the fields in struct page_info for the page
can be used when the page is a metadata one. Simply record the count there?
I suppose that|union u| could be used.
The only thing that confuses me is the shadow paging implementation on x86.
In|struct page_info|, it has the following:
/* Context-dependent fields follow... */
union {
/* Page is in use: ((count_info & PGC_count_mask) != 0). */
struct {
/* Type reference count and various PGT_xxx flags and fields. */
unsigned long type_info;
} inuse;
/* Page is in use as a shadow: count_info == 0. */
struct {
....
} sh;
/* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
union {
So it seems that something in the shadow code must set|count_info| to zero for
shadow pages. But I cannot find where this happens. I would expect it to be done
in|shadow_alloc()|, when the page is taken from the free list. However, pages
from the free list donot have|count_info == 0| since|alloc_heap_pages()
|initializes|count_info|.
What guarantees that|count_info| will be zero for shadow tables?
Interestingly, in the shadow p2m page free code, there is logic that resets
|count_info| to zero:
{
struct domain *owner = page_get_owner(pg);
/* Should still have no owner and count zero. */
if ( owner || (pg->count_info & PGC_count_mask) )
{
printk(XENLOG_ERR
"d%d: Odd p2m page %"PRI_mfn" d=%d c=%lx t=%"PRtype_info"\n",
d->domain_id, mfn_x(page_to_mfn(pg)),
owner ? owner->domain_id : DOMID_INVALID,
pg->count_info, pg->u.inuse.type_info);
pg->count_info &= ~PGC_count_mask;
page_set_owner(pg, NULL);
}
...
And another question: since|u.sh.*| is updated, it effectively overwrites
|u.inuse.type_info|. But|u.inuse.type_info| is used by|free_domheap_pages()|,
which is called from|shadow_free()|:
void free_domheap_pages(struct page_info *pg, unsigned int order)
{
...
if ( pg[i].u.inuse.type_info & PGT_count_mask )
{
printk(XENLOG_ERR
"pg[%u] MFN %"PRI_mfn" c=%#lx o=%u v=%#lx t=%#x\n",
i, mfn_x(page_to_mfn(pg + i)),
pg[i].count_info, pg[i].v.free.order,
pg[i].u.free.val, pg[i].tlbflush_timestamp);
BUG();
}
...
Why is it acceptable that|u.inuse.type_info| will likely not be zero, since
|u.sh.*| modifies the same union field, at least during shadow page allocation?
Finally, as to "rather expensive": Scanning a 4k page to hold all zeroes can't
be all that expensive? In any event that expensiveness needs weighing carefully
against the risk of getting the counter maintenance wrong.
It depends on how often|p2m_set_type()| will be called.
In the worst case, it will require a loop that performs up to 512 iterations
to scan a 4 KB page (512 * sizeof(struct mt_d) = 4096), which I think could be
expensive if|p2m_set_type()| is invoked frequently.
~ Oleksii