On Tue, Jul 20, 2021 at 03:03:04PM +0200, David Hildenbrand wrote: > virtio-mem logically plugs/unplugs memory within a sparse memory region > and notifies via the RamDiscardManager interface when parts become > plugged (populated) or unplugged (discarded). > > Currently, we end up (via the two users) > 1) zeroing all logically unplugged/discarded memory during TPM resets. > 2) reading all logically unplugged/discarded memory when dumping, to > figure out the content is zero. > > 1) is always bad, because we assume unplugged memory stays discarded > (and is already implicitly zero). > 2) isn't that bad with anonymous memory, we end up reading the zero > page (slow and unnecessary, though). However, once we use some > file-backed memory (future use case), even reading will populate memory. > > Let's cut out all parts marked as not-populated (discarded) via the > RamDiscardManager. As virtio-mem is the single user, this now means that > logically unplugged memory ranges will no longer be included in the > dump, which results in smaller dump files and faster dumping. > > virtio-mem has a minimum granularity of 1 MiB (and the default is usually > 2 MiB). Theoretically, we can see quite some fragmentation, in practice > we won't have it completely fragmented in 1 MiB pieces. Still, we might > end up with many physical ranges. > > Both, the ELF format and kdump seem to be ready to support many > individual ranges (e.g., for ELF it seems to be UINT32_MAX, kdump has a > linear bitmap). > > Cc: Marc-André Lureau <marcandre.lur...@redhat.com> > Cc: Paolo Bonzini <pbonz...@redhat.com> > Cc: "Michael S. Tsirkin" <m...@redhat.com> > Cc: Eduardo Habkost <ehabk...@redhat.com> > Cc: Alex Williamson <alex.william...@redhat.com> > Cc: Dr. David Alan Gilbert <dgilb...@redhat.com> > Cc: Igor Mammedov <imamm...@redhat.com> > Cc: Claudio Fontana <cfont...@suse.de> > Cc: Thomas Huth <th...@redhat.com> > Cc: "Alex Bennée" <alex.ben...@linaro.org> > Cc: Peter Xu <pet...@redhat.com> > Cc: Laurent Vivier <lviv...@redhat.com> > Cc: Stefan Berger <stef...@linux.ibm.com> > Signed-off-by: David Hildenbrand <da...@redhat.com> > --- > softmmu/memory_mapping.c | 20 ++++++++++++++++++++ > 1 file changed, 20 insertions(+) > > diff --git a/softmmu/memory_mapping.c b/softmmu/memory_mapping.c > index b7e4f3f788..856778a109 100644 > --- a/softmmu/memory_mapping.c > +++ b/softmmu/memory_mapping.c > @@ -246,6 +246,15 @@ static void > guest_phys_block_add_section(GuestPhysListener *g, > #endif > } > > +static int guest_phys_ram_populate_cb(MemoryRegionSection *section, > + void *opaque) > +{ > + GuestPhysListener *g = opaque; > + > + guest_phys_block_add_section(g, section); > + return 0; > +} > + > static void guest_phys_blocks_region_add(MemoryListener *listener, > MemoryRegionSection *section) > { > @@ -257,6 +266,17 @@ static void guest_phys_blocks_region_add(MemoryListener > *listener, > memory_region_is_nonvolatile(section->mr)) { > return; > } > + > + /* for special sparse regions, only add populated parts */ > + if (memory_region_has_ram_discard_manager(section->mr)) { > + RamDiscardManager *rdm; > + > + rdm = memory_region_get_ram_discard_manager(section->mr); > + ram_discard_manager_replay_populated(rdm, section, > + guest_phys_ram_populate_cb, g); > + return; > + } > + > guest_phys_block_add_section(g, section); > }
As I've asked this question previously elsewhere, it's more or less also related to the design decision of having virtio-mem being able to sparsely plugged in such a small granularity rather than making the plug/unplug still continuous within GPA range (so we move page when unplug). There's definitely reasons there and I believe you're the expert on that (as you mentioned once: some guest GUPed pages cannot migrate so cannot get those ranges offlined otherwise), but so far I still not sure whether that's a kernel issue to solve on GUP, although I agree it's a complicated one anyway! Maybe it's a trade-off you made at last, I don't have enough knowledge to tell. The patch itself looks okay to me, there's just a slight worry on not sure how long would the list be at last; if it's chopped in 1M/2M small chunks. Thanks, -- Peter Xu