Re: [Qemu-devel] [RFCv2 for-4.0 4/5] virtio-balloon: Use ram_block_discard_range() instead of raw madvise()

David Hildenbrand Wed, 05 Dec 2018 00:00:02 -0800

On 05.12.18 06:06, David Gibson wrote:
> Currently, virtio-balloon uses madvise() with MADV_DONTNEED to actually
> discard RAM pages inserted into the balloon.  This is basically a Linux
> only interface (MADV_DONTNEED exists on some other platforms, but doesn't
> always have the same semantics).  It also doesn't work on hugepages and has
> some other limitations.
> 
> It turns out that postcopy also needs to discard chunks of memory, and uses
> a better interface for it: ram_block_discard_range().  It doesn't cover
> every case, but it covers more than going direct to madvise() and this
> gives us a single place to update for more possibilities in future.
> 
> There are some subtleties here to maintain the current balloon behaviour:
> 
> * For now, we just ignore requests to balloon in a hugepage backed region.
>   That matches current behaviour, because MADV_DONTNEED on a hugepage would
>   simply fail, and we ignore the error.
> 
> * If host page size is > BALLOON_PAGE_SIZE we can frequently call this on
>   non-host-page-aligned addresses.  These would also fail in madvise(),
>   which we then ignored.  ram_block_discard_range() error_report()s calls
>   on unaligned addresses, so we explicitly check that case to avoid
>   spamming the logs.
> 
> * We now call ram_block_discard_range() with the *host* page size, whereas
>   we previously called madvise() with BALLOON_PAGE_SIZE.  Surprisingly,
>   this also matches existing behaviour.  Although the kernel fails madvise
>   on unaligned addresses, it will round unaligned sizes *up* to the host
>   page size.  Yes, this means that if BALLOON_PAGE_SIZE < guest page size
>   we can incorrectly discard more memory than the guest asked us to.  I'm
>   planning to address that soon.
> 
> Errors other than the ones discussed above, will now be reported by
> ram_block_discard_range(), rather than silently ignored, which means we
> have a much better chance of seeing when something is going wrong.
> 
> Signed-off-by: David Gibson <da...@gibson.dropbear.id.au>
> Reviewed-by: Michael S. Tsirkin <m...@redhat.com>
> ---
>  hw/virtio/virtio-balloon.c | 23 ++++++++++++++++++++++-
>  1 file changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index c3a19aa27d..4435905c87 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -37,8 +37,29 @@ static void balloon_inflate_page(VirtIOBalloon *balloon,
>                                   MemoryRegion *mr, hwaddr offset)
>  {
>      void *addr = memory_region_get_ram_ptr(mr) + offset;
> +    RAMBlock *rb;
> +    size_t rb_page_size;
> +    ram_addr_t ram_offset;
>  
> -    qemu_madvise(addr, BALLOON_PAGE_SIZE, QEMU_MADV_DONTNEED);
> +    /* XXX is there a better way to get to the RAMBlock than via a
> +     * host address? */


We have qemu_get_ram_block(). That one should work as long as we know
that it is a valid guest ram address. (would have to make it !static)

Then we would only have to pass to this function the original ram_addr_t
handed over by the guest (which looks somewhat cleaner to me than going
via memory regions)

> +    rb = qemu_ram_block_from_host(addr, false, &ram_offset);
> +    rb_page_size = qemu_ram_pagesize(rb);
> +
> +    /* Silently ignore hugepage RAM blocks */
> +    if (rb_page_size != getpagesize()) {
> +        return;
> +    }
> +
> +    /* Silently ignore unaligned requests */
> +    if (ram_offset & (rb_page_size - 1)) {
> +        return;
> +    }
> +
> +    ram_block_discard_range(rb, ram_offset, rb_page_size);
> +    /* We ignore errors from ram_block_discard_range(), because it has
> +     * already reported them, and failing to discard a balloon page is
> +     * not fatal */
>  }
>  
>  static const char *balloon_stat_names[] = {
> 


-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [RFCv2 for-4.0 4/5] virtio-balloon: Use ram_block_discard_range() instead of raw madvise()

Reply via email to