On 29.09.20 19:02, Dr. David Alan Gilbert wrote: > * David Hildenbrand (da...@redhat.com) wrote: >> This is a quick and dirty (1.5 days of hacking) prototype to make >> vfio and virtio-mem play together. The basic idea was the result of Alex >> brainstorming with me on how to tackle this. >> >> A virtio-mem device manages a memory region in guest physical address >> space, represented as a single (currently large) memory region in QEMU. >> Before the guest is allowed to use memory blocks, it must coordinate with >> the hypervisor (plug blocks). After a reboot, all memory is usually >> unplugged - when the guest comes up, it detects the virtio-mem device and >> selects memory blocks to plug (based on requests from the hypervisor). >> >> Memory hot(un)plug consists of (un)plugging memory blocks via a virtio-mem >> device (triggered by the guest). When unplugging blocks, we discard the >> memory. In contrast to memory ballooning, we always know which memory >> blocks a guest may use - especially during a reboot, after a crash, or >> after kexec. >> >> The issue with vfio is, that it cannot deal with random discards - for this >> reason, virtio-mem and vfio can currently only run mutually exclusive. >> Especially, vfio would currently map the whole memory region (with possible >> only little/no plugged blocks), resulting in all pages getting pinned and >> therefore resulting in a higher memory consumption than expected (turning >> virtio-mem basically useless in these environments). >> >> To make vfio work nicely with virtio-mem, we have to map only the plugged >> blocks, and map/unmap properly when plugging/unplugging blocks (including >> discarding of RAM when unplugging). We achieve that by using a new notifier >> mechanism that communicates changes. >> >> It's important to map memory in the granularity in which we could see >> unmaps again (-> virtio-mem block size) - so when e.g., plugging >> consecutive 100 MB with a block size of 2MB, we need 50 mappings. When >> unmapping, we can use a single vfio_unmap call for the applicable range. >> We expect that the block size of virtio-mem devices will be fairly large >> in the future (to not run out of mappings and to improve hot(un)plug >> performance), configured by the user, when used with vfio (e.g., 128MB, >> 1G, ...) - Linux guests will still have to be optimized for that. > > This seems pretty painful for those few TB mappings. > Also the calls seem pretty painful; maybe it'll be possible to have > calls that are optimised for making multiple consecutive mappings.
Exactly the future I imagine. This patchset is with no kernel interface additions - once we have an optimized interface that understands consecutive mappings (and the granularity), we can use that instead. The prototype already prepared for that by notifying about consecutive ranges. Thanks! -- Thanks, David / dhildenb