On 03.12.20 00:26, Alex Williamson wrote: > On Thu, 19 Nov 2020 16:39:10 +0100 > David Hildenbrand <da...@redhat.com> wrote: > >> We have some special RAM memory regions (managed by virtio-mem), whereby >> the guest agreed to only use selected memory ranges. "unused" parts are >> discarded so they won't consume memory - to logically unplug these memory >> ranges. Before the VM is allowed to use such logically unplugged memory >> again, coordination with the hypervisor is required. >> >> This results in "sparse" mmaps/RAMBlocks/memory regions, whereby only >> coordinated parts are valid to be used/accessed by the VM. >> >> In most cases, we don't care about that - e.g., in KVM, we simply have a >> single KVM memory slot. However, in case of vfio, registering the >> whole region with the kernel results in all pages getting pinned, and >> therefore an unexpected high memory consumption - discarding of RAM in >> that context is broken. >> >> Let's introduce a way to coordinate discarding/populating memory within a >> RAM memory region with such special consumers of RAM memory regions: they >> can register as listeners and get updates on memory getting discarded and >> populated. Using this machinery, vfio will be able to map only the >> currently populated parts, resulting in discarded parts not getting pinned >> and not consuming memory. >> >> A RamDiscardMgr has to be set for a memory region before it is getting >> mapped, and cannot change while the memory region is mapped. >> >> Note: At some point, we might want to let RAMBlock users (esp. vfio used >> for nvme://) consume this interface as well. We'll need RAMBlock notifier >> calls when a RAMBlock is getting mapped/unmapped (via the corresponding >> memory region), so we can properly register a listener there as well. >> >> Cc: Paolo Bonzini <pbonz...@redhat.com> >> Cc: "Michael S. Tsirkin" <m...@redhat.com> >> Cc: Alex Williamson <alex.william...@redhat.com> >> Cc: Dr. David Alan Gilbert <dgilb...@redhat.com> >> Cc: Igor Mammedov <imamm...@redhat.com> >> Cc: Pankaj Gupta <pankaj.gupta.li...@gmail.com> >> Cc: Peter Xu <pet...@redhat.com> >> Cc: Auger Eric <eric.au...@redhat.com> >> Cc: Wei Yang <richard.weiy...@linux.alibaba.com> >> Cc: teawater <teawat...@linux.alibaba.com> >> Cc: Marek Kedzierski <mkedz...@redhat.com> >> Signed-off-by: David Hildenbrand <da...@redhat.com> >> --- >> include/exec/memory.h | 225 ++++++++++++++++++++++++++++++++++++++++++ >> softmmu/memory.c | 22 +++++ >> 2 files changed, 247 insertions(+) >> >> diff --git a/include/exec/memory.h b/include/exec/memory.h >> index 0f3e6bcd5e..468cbb53a4 100644 >> --- a/include/exec/memory.h >> +++ b/include/exec/memory.h > ... >> @@ -425,6 +501,120 @@ struct IOMMUMemoryRegionClass { >> Error **errp); >> }; >> >> +/* >> + * RamDiscardMgrClass: >> + * >> + * A #RamDiscardMgr coordinates which parts of specific RAM #MemoryRegion >> + * regions are currently populated to be used/accessed by the VM, notifying >> + * after parts were discarded (freeing up memory) and before parts will be >> + * populated (consuming memory), to be used/acessed by the VM. >> + * >> + * A #RamDiscardMgr can only be set for a RAM #MemoryRegion while the >> + * #MemoryRegion isn't mapped yet; it cannot change while the #MemoryRegion >> is >> + * mapped. >> + * >> + * The #RamDiscardMgr is intended to be used by technologies that are >> + * incompatible with discarding of RAM (e.g., VFIO, which may pin all >> + * memory inside a #MemoryRegion), and require proper coordination to only >> + * map the currently populated parts, to hinder parts that are expected to >> + * remain discarded from silently getting populated and consuming memory. >> + * Technologies that support discarding of RAM don't have to bother and can >> + * simply map the whole #MemoryRegion. >> + * >> + * An example #RamDiscardMgr is virtio-mem, which logically (un)plugs >> + * memory within an assigned RAM #MemoryRegion, coordinated with the VM. >> + * Logically unplugging memory consists of discarding RAM. The VM agreed to >> not >> + * access unplugged (discarded) memory - especially via DMA. virtio-mem will >> + * properly coordinate with listeners before memory is plugged (populated), >> + * and after memory is unplugged (discarded). >> + * >> + * Listeners are called in multiples of the minimum granularity and changes >> are >> + * aligned to the minimum granularity within the #MemoryRegion. Listeners >> have >> + * to prepare for memory becomming discarded in a different granularity >> than it >> + * was populated and the other way around. >> + */ >> +struct RamDiscardMgrClass { >> + /* private */ >> + InterfaceClass parent_class; >> + >> + /* public */ >> + >> + /** >> + * @get_min_granularity: >> + * >> + * Get the minimum granularity in which listeners will get notified >> + * about changes within the #MemoryRegion via the #RamDiscardMgr. >> + * >> + * @rdm: the #RamDiscardMgr >> + * @mr: the #MemoryRegion >> + * >> + * Returns the minimum granularity. >> + */ >> + uint64_t (*get_min_granularity)(const RamDiscardMgr *rdm, >> + const MemoryRegion *mr); >> + >> + /** >> + * @is_populated: >> + * >> + * Check whether the given range within the #MemoryRegion is completely >> + * populated (i.e., no parts are currently discarded). There are no >> + * alignment requirements for the range. >> + * >> + * @rdm: the #RamDiscardMgr >> + * @mr: the #MemoryRegion >> + * @offset: offset into the #MemoryRegion >> + * @size: size in the #MemoryRegion >> + * >> + * Returns the minimum granularity. > > > I think the return description got copied from above, this returns bool.
Ah, thanks for catching that. > > ... >> diff --git a/softmmu/memory.c b/softmmu/memory.c >> index aa393f1bb0..fbdc50fa8b 100644 >> --- a/softmmu/memory.c >> +++ b/softmmu/memory.c >> @@ -2013,6 +2013,21 @@ int memory_region_iommu_num_indexes(IOMMUMemoryRegion >> *iommu_mr) >> return imrc->num_indexes(iommu_mr); >> } >> >> +RamDiscardMgr *memory_region_get_ram_discard_mgr(MemoryRegion *mr) >> +{ >> + if (!memory_region_is_mapped(mr) || !memory_region_is_ram(mr)) { >> + return false; > > s/false/NULL/? Thanks! I think I've been reworking this patch too often :) -- Thanks, David / dhildenb