On 05.12.18 06:06, David Gibson wrote: > Currently, virtio-balloon uses madvise() with MADV_DONTNEED to actually > discard RAM pages inserted into the balloon. This is basically a Linux > only interface (MADV_DONTNEED exists on some other platforms, but doesn't > always have the same semantics). It also doesn't work on hugepages and has > some other limitations. > > It turns out that postcopy also needs to discard chunks of memory, and uses > a better interface for it: ram_block_discard_range(). It doesn't cover > every case, but it covers more than going direct to madvise() and this > gives us a single place to update for more possibilities in future. > > There are some subtleties here to maintain the current balloon behaviour: > > * For now, we just ignore requests to balloon in a hugepage backed region. > That matches current behaviour, because MADV_DONTNEED on a hugepage would > simply fail, and we ignore the error. > > * If host page size is > BALLOON_PAGE_SIZE we can frequently call this on > non-host-page-aligned addresses. These would also fail in madvise(), > which we then ignored. ram_block_discard_range() error_report()s calls > on unaligned addresses, so we explicitly check that case to avoid > spamming the logs. > > * We now call ram_block_discard_range() with the *host* page size, whereas > we previously called madvise() with BALLOON_PAGE_SIZE. Surprisingly, > this also matches existing behaviour. Although the kernel fails madvise > on unaligned addresses, it will round unaligned sizes *up* to the host > page size. Yes, this means that if BALLOON_PAGE_SIZE < guest page size > we can incorrectly discard more memory than the guest asked us to. I'm > planning to address that soon. > > Errors other than the ones discussed above, will now be reported by > ram_block_discard_range(), rather than silently ignored, which means we > have a much better chance of seeing when something is going wrong. > > Signed-off-by: David Gibson <da...@gibson.dropbear.id.au> > Reviewed-by: Michael S. Tsirkin <m...@redhat.com> > --- > hw/virtio/virtio-balloon.c | 23 ++++++++++++++++++++++- > 1 file changed, 22 insertions(+), 1 deletion(-) > > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c > index c3a19aa27d..4435905c87 100644 > --- a/hw/virtio/virtio-balloon.c > +++ b/hw/virtio/virtio-balloon.c > @@ -37,8 +37,29 @@ static void balloon_inflate_page(VirtIOBalloon *balloon, > MemoryRegion *mr, hwaddr offset) > { > void *addr = memory_region_get_ram_ptr(mr) + offset; > + RAMBlock *rb; > + size_t rb_page_size; > + ram_addr_t ram_offset; > > - qemu_madvise(addr, BALLOON_PAGE_SIZE, QEMU_MADV_DONTNEED); > + /* XXX is there a better way to get to the RAMBlock than via a > + * host address? */
We have qemu_get_ram_block(). That one should work as long as we know that it is a valid guest ram address. (would have to make it !static) Then we would only have to pass to this function the original ram_addr_t handed over by the guest (which looks somewhat cleaner to me than going via memory regions) > + rb = qemu_ram_block_from_host(addr, false, &ram_offset); > + rb_page_size = qemu_ram_pagesize(rb); > + > + /* Silently ignore hugepage RAM blocks */ > + if (rb_page_size != getpagesize()) { > + return; > + } > + > + /* Silently ignore unaligned requests */ > + if (ram_offset & (rb_page_size - 1)) { > + return; > + } > + > + ram_block_discard_range(rb, ram_offset, rb_page_size); > + /* We ignore errors from ram_block_discard_range(), because it has > + * already reported them, and failing to discard a balloon page is > + * not fatal */ > } > > static const char *balloon_stat_names[] = { > -- Thanks, David / dhildenb