On Fri, Oct 12, 2018 at 10:06:50AM +0200, David Hildenbrand wrote: > On 12/10/2018 05:24, David Gibson wrote: > > The virtio-balloon always works in units of 4kiB (BALLOON_PAGE_SIZE), but > > on the host side, we can only actually discard memory in units of the host > > page size. > > > > At present we handle this very badly: we silently ignore balloon requests > > that aren't host page aligned, and for requests that are host page aligned > > we discard the entire host page. The latter potentially corrupts guest > > memory if its page size is smaller than the host's. > > > > We could just disable the balloon if the host page size is not 4kiB, but > > that would break a the special case where host and guest have the same page > > size, but that's larger than 4kiB. Thius case currently works by accident: > > when the guest puts its page into the balloon, it will submit balloon > > requests for each 4kiB subpage. Most will be ignored, but the one which > > happens to be host page aligned will discard the whole lot. > > Actually I would vote for disabling it if the host page size is not 4k. > This is broken by design and should not be worked around.
I really don't think that's feasible. This is not some theoretical edge case: essentially every KVM on POWER system out there with balloon enabled is in this situation. And AIUI, a bunch of common management layers routinely enable the balloon. Just disabling this for non-4k systems will break existing, supported production setups. > I consider 64k a similar case as huge pages in the hypervisor (from a > virtio-balloon interface point of view - which is broken by design for > these cases). Technically it is very similar, but again, this is in routine use on real systems out there. > > This occurs in practice routinely for POWER KVM systems, since both host > > and guest typically use 64kiB pages. > > > > To make this safe, without breaking that useful case, we need to > > accumulate 4kiB balloon requests until we have a whole contiguous host page > > at which point we can discard it. > > ... and if you have a 4k guest it will just randomly report pages and > your 67 LOC tweak is useless. Yes. > And this can and will easily happen. What cases are you thinking of? For POWER at least, 4k on 64k will be vastly less common than 64k on 64k. > > We could in principle do that across all guest memory, but it would require > > a large bitmap to track. This patch represents a compromise: instead we > > Oh god no, no tracking of all memory. (size of bitmap, memory hotplug, > ... migration?) Quite, that's why I didn't do it. Although, I don't actually think migration is such an issue: we *already* essentially lose track of which pages are inside the balloon across migration. > > track ballooned subpages for a single contiguous host page at a time. This > > means that if the guest discards all 4kiB chunks of a host page in > > succession, we will discard it. In particular that means the balloon will > > continue to work for the (host page size) == (guest page size) > 4kiB case. > > > > If the guest scatters 4kiB requests across different host pages, we don't > > discard anything, and issue a warning. Not ideal, but at least we don't > > corrupt guest memory as the previous version could. > > My take would be to somehow disable the balloon on the hypervisor side > in case the host page size is not 4k. Like not allowing to realize it. > No corruptions, no warnings people will ignore. No, that's even worse than just having it silently do nothing on non-4k setups. Silently being useless we might get away with, we'll just waste memory, but failing the device load will absolutely break existing setups. > > Signed-off-by: David Gibson <da...@gibson.dropbear.id.au> > > --- > > hw/virtio/virtio-balloon.c | 67 +++++++++++++++++++++++++----- > > include/hw/virtio/virtio-balloon.h | 3 ++ > > 2 files changed, 60 insertions(+), 10 deletions(-) > > > > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c > > index 4435905c87..39573ef2e3 100644 > > --- a/hw/virtio/virtio-balloon.c > > +++ b/hw/virtio/virtio-balloon.c > > @@ -33,33 +33,80 @@ > > > > #define BALLOON_PAGE_SIZE (1 << VIRTIO_BALLOON_PFN_SHIFT) > > > > +typedef struct PartiallyBalloonedPage { > > + RAMBlock *rb; > > + ram_addr_t base; > > + unsigned long bitmap[]; > > +} PartiallyBalloonedPage; > > + > > static void balloon_inflate_page(VirtIOBalloon *balloon, > > MemoryRegion *mr, hwaddr offset) > > { > > void *addr = memory_region_get_ram_ptr(mr) + offset; > > RAMBlock *rb; > > size_t rb_page_size; > > - ram_addr_t ram_offset; > > + int subpages; > > + ram_addr_t ram_offset, host_page_base; > > > > /* XXX is there a better way to get to the RAMBlock than via a > > * host address? */ > > rb = qemu_ram_block_from_host(addr, false, &ram_offset); > > rb_page_size = qemu_ram_pagesize(rb); > > + host_page_base = ram_offset & ~(rb_page_size - 1); > > + > > + if (rb_page_size == BALLOON_PAGE_SIZE) { > > + /* Easy case */ > > > > - /* Silently ignore hugepage RAM blocks */ > > - if (rb_page_size != getpagesize()) { > > + ram_block_discard_range(rb, ram_offset, rb_page_size); > > + /* We ignore errors from ram_block_discard_range(), because it > > + * has already reported them, and failing to discard a balloon > > + * page is not fatal */ > > return; > > } > > > > - /* Silently ignore unaligned requests */ > > - if (ram_offset & (rb_page_size - 1)) { > > - return; > > + /* Hard case > > + * > > + * We've put a piece of a larger host page into the balloon - we > > + * need to keep track until we have a whole host page to > > + * discard > > + */ > > + subpages = rb_page_size / BALLOON_PAGE_SIZE; > > + > > + if (balloon->pbp > > + && (rb != balloon->pbp->rb > > + || host_page_base != balloon->pbp->base)) { > > + /* We've partially ballooned part of a host page, but now > > + * we're trying to balloon part of a different one. Too hard, > > + * give up on the old partial page */ > > + warn_report("Unable to insert a partial page into virtio-balloon"); > > I am pretty sure that you can create misleading warnings in case you > migrate at the wrong time. (migrate while half the 64k page is inflated, > on the new host the other part is inflated - a warning when switching to > another 64k page). Yes we can get bogus warnings across migration with this. I was considering that an acceptable price, but I'm open to better alternatives. > > + free(balloon->pbp); > > + balloon->pbp = NULL; > > } > > > > - ram_block_discard_range(rb, ram_offset, rb_page_size); > > - /* We ignore errors from ram_block_discard_range(), because it has > > - * already reported them, and failing to discard a balloon page is > > - * not fatal */ > > + if (!balloon->pbp) { > > + /* Starting on a new host page */ > > + size_t bitlen = BITS_TO_LONGS(subpages) * sizeof(unsigned long); > > + balloon->pbp = g_malloc0(sizeof(PartiallyBalloonedPage) + bitlen); > > + balloon->pbp->rb = rb; > > + balloon->pbp->base = host_page_base; > > + } > > + > > + bitmap_set(balloon->pbp->bitmap, > > + (ram_offset - balloon->pbp->base) / BALLOON_PAGE_SIZE, > > + subpages); > > + > > + if (bitmap_full(balloon->pbp->bitmap, subpages)) { > > + /* We've accumulated a full host page, we can actually discard > > + * it now */ > > + > > + ram_block_discard_range(rb, balloon->pbp->base, rb_page_size); > > + /* We ignore errors from ram_block_discard_range(), because it > > + * has already reported them, and failing to discard a balloon > > + * page is not fatal */ > > + > > + free(balloon->pbp); > > + balloon->pbp = NULL; > > + } > > } > No, not a fan of this approach. I can see why, but I really can't see what else to do without breaking existing, supported, working (albeit by accident) setups. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature