On 11/6/24 17:16, Jason Andryuk wrote:
On 2024-11-02 13:25, Daniel P. Smith wrote:
A precarious approach was used to release the pages used to hold a
boot module.
The precariousness stemmed from the fact that in the case of PV dom0, the
initrd module pages may be either mapped or explicitly copied into the
dom0
address space. So to handle this situation, the PV dom0 construction
code will
set the size of the module to zero, relying on
discard_initial_images() to skip
any modules with a size of zero.
A function is introduced to release a module when it is no longer
needed that
accepts a boolean parameter, free_mem, to indicate if the
corresponding pages
can be freed. To track that a module has been released, the boot
module flag
`released` is introduced.
The previous release model was a free all at once except those of size
zeros,
which would handle any unused modules passed. The new model is one of,
free
consumed module after usage is complete, thus unconsumed modules do
not have a
consumer to free them.
Slightly confusing. Maybe just "The new model is to free modules after
they are consumed. Thus unconsumed modules are not freed."
okay.
To address this, the discard_uknown_boot_modules() is
"unknown"
Ack
introduced and called after the last module identification occurs,
initrd, to
free the pages of any boot modules that are identified as not being
released.
After domain construction completes, all modules should be freed. A
walk of the
boot modules is added after domain construction to validate and notify
if a
module is found not to have been released.
Signed-off-by: Daniel P. Smith <dpsm...@apertussolutions.com>
---
Changes since v7:
- This is a new approach as an alternative to the `consumed` flag.
---
xen/arch/x86/cpu/microcode/core.c | 4 ++
xen/arch/x86/hvm/dom0_build.c | 7 ++--
xen/arch/x86/include/asm/bootinfo.h | 2 +
xen/arch/x86/include/asm/setup.h | 3 +-
xen/arch/x86/pv/dom0_build.c | 20 ++--------
xen/arch/x86/setup.c | 57 +++++++++++++++++++++++------
xen/xsm/xsm_core.c | 5 +++
7 files changed, 67 insertions(+), 31 deletions(-)
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index d061ece0541f..e6d2d25fd038 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -341,27 +341,55 @@ unsigned long __init
initial_images_nrpages(nodeid_t node)
return nr;
}
-void __init discard_initial_images(void) /* a.k.a. Free boot modules */
+void __init release_boot_module(struct boot_module *bm, bool free_mem)
+{
+ uint64_t start = pfn_to_paddr(bm->mod->mod_start);
+ uint64_t size = bm->mod->mod_end;
+
+ if ( bm->released )
+ {
+ printk(XENLOG_WARNING "Attempt second release boot module of
type %d\n",
+ bm->type);
+ return;
+ }
+
+ if ( free_mem )
+ init_domheap_pages(start, start + PAGE_ALIGN(size));
+
+ bm->released = true;
+}
+
+void __init release_module(const module_t *m, bool free_mem)
{
struct boot_info *bi = &xen_boot_info;
unsigned int i;
- for ( i = 0; i < bi->nr_modules; ++i )
+ for ( i = 0; i < bi->nr_modules; i++ )
{
- uint64_t start = pfn_to_paddr(bi->mods[i].mod->mod_start);
- uint64_t size = bi->mods[i].mod->mod_end;
+ if ( bi->mods[i].mod == m )
+ release_boot_module(&bi->mods[i], free_mem);
+ }
+}
- /*
- * Sometimes the initrd is mapped, rather than copied, into
dom0.
- * Size being 0 is how we're instructed to leave the module
alone.
- */
- if ( size == 0 )
+static void __init discard_unknown_boot_modules(void)
+{
+ struct boot_info *bi = &xen_boot_info;
+ unsigned int i, count = 0;
+
+ for_each_boot_module_by_type(i, bi, BOOTMOD_UNKNOWN)
for_each_boot_module_by_type ( i, bi, BOOTMOD_UNKNOWN )
To match style from 74af2d98276d ("x86/boot: eliminate module_map")
Ack.
+ {
+ struct boot_module *bm = &bi->mods[i];
+
+ if ( bm == NULL || bm->released )
continue;
- init_domheap_pages(start, start + PAGE_ALIGN(size));
+ release_boot_module(bm, true);
+ count++;
}
- bi->nr_modules = 0;
+ if ( count )
+ printk(XENLOG_DEBUG "Releasing pages for uknown boot module
%d\n",
"unknown". Since the operation already happened, maybe "Released pages
for %d unknown boot modules"? I'm not sure of the value of that
message. It would be more informative to provide the module index and
maybe a page count. The index would at least point to which module is
unused.
Ack to unknown.
Can adjust the phrasing, the question is there a desire to have a
message for every boot module freed. Guess I could do a single log line
split across multiple printks, Thinking about the case where someone
tried to abuse the interface by loading a bunch of unused modules.
+ count);
}
static void __init init_idle_domain(void)
@@ -2111,6 +2139,8 @@ void asmlinkage __init noreturn __start_xen(void)
initrdidx);
}
+ discard_unknown_boot_modules();
+
/*
* We're going to setup domain0 using the module(s) that we
stashed safely
* above our heap. The second module, if present, is an initrd
ramdisk.
@@ -2122,6 +2152,11 @@ void asmlinkage __init noreturn __start_xen(void)
if ( !dom0 )
panic("Could not set up DOM0 guest OS\n");
+ /* Check and warn if any modules did not get released */
+ for ( i = 0; i < bi->nr_modules; i++ )
+ if ( !bi->mods[i].released )
+ printk(XENLOG_ERR "Boot module %d not released, memory
leaked", i);
+
Why not release the memory here instead of leaking it?
Because you don't know if it was mapped or consumed.
I feel like the corner case of mapping the dom0 initrd is leading to
this manual approach or releasing modules individually. That logic then
has to be spread around. discard_initial_images() kept the logic
centralized. Maybe just replace the mod_end == 0 special case with a
"don't release me" flag that is checked in discard_initial_images()?
That is what started me at the options to deal with it. The two I came
up with was a flag and this approach. Weighing the pros/cons of the two,
the deciding factor is when multi-domain construction is introduced.
With multi-domain with a large number of domains, a balance has to be
struck between holding all the kernels and ramdisks in memory plus being
able to allocate the desired amount of memory for each domain. Perhaps a
balance can be struck, with a discard_consumed_boot_modules() that walks
the boot module list, and discard the ones consumed. While only dom0 can
be constructed, it would be called once after create_dom0 call returns
(per Andy's suggestion in response to this comment). An expansion on
this could be that instead of using a free flag, use a ref count that is
incremented as it is claimed and the decremented when it has been consumed.
v/r,
dps