On 22.04.20 20:21, Alexander Duyck wrote: > From: Alexander Duyck <alexander.h.du...@linux.intel.com> > > Add support for free page reporting. The idea is to function very similar > to how the balloon works in that we basically end up madvising the page as > not being used. However we don't really need to bother with any deflate > type logic since the page will be faulted back into the guest when it is > read or written to. > > This provides a new way of letting the guest proactively report free > pages to the hypervisor, so the hypervisor can reuse them. In contrast to > inflate/deflate that is triggered via the hypervisor explicitly. > > Signed-off-by: Alexander Duyck <alexander.h.du...@linux.intel.com> > --- > hw/virtio/virtio-balloon.c | 70 > ++++++++++++++++++++++++++++++++++++ > include/hw/virtio/virtio-balloon.h | 2 + > 2 files changed, 71 insertions(+), 1 deletion(-) > > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c > index 5effc8b4653b..b473ff7f4b88 100644 > --- a/hw/virtio/virtio-balloon.c > +++ b/hw/virtio/virtio-balloon.c > @@ -321,6 +321,60 @@ static void balloon_stats_set_poll_interval(Object *obj, > Visitor *v, > balloon_stats_change_timer(s, 0); > } > > +static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq) > +{ > + VirtIOBalloon *dev = VIRTIO_BALLOON(vdev); > + VirtQueueElement *elem; > + > + while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement)))) { > + unsigned int i; > +
Maybe add a comment like /* * As discarded pages will be zero when re-accessed, all pages either * have the old value, or were zeroed out. In case the guest expects * another value, make sure to never discard. */ Whatever you think is best. > + if (qemu_balloon_is_inhibited() || dev->poison_val) { > + goto skip_element; > + } > + > + for (i = 0; i < elem->in_num; i++) { > + void *addr = elem->in_sg[i].iov_base; > + size_t size = elem->in_sg[i].iov_len; > + ram_addr_t ram_offset; > + RAMBlock *rb; > + > + /* > + * There is no need to check the memory section to see if > + * it is ram/readonly/romd like there is for handle_output > + * below. If the region is not meant to be written to then > + * address_space_map will have allocated a bounce buffer > + * and it will be freed in address_space_unmap and trigger > + * and unassigned_mem_write before failing to copy over the > + * buffer. If more than one bad descriptor is provided it > + * will return NULL after the first bounce buffer and fail > + * to map any resources. > + */ > + rb = qemu_ram_block_from_host(addr, false, &ram_offset); > + if (!rb) { > + trace_virtio_balloon_bad_addr(elem->in_addr[i]); > + continue; > + } > + > + /* > + * For now we will simply ignore unaligned memory regions, or > + * regions that overrun the end of the RAMBlock. > + */ > + if (!QEMU_IS_ALIGNED(ram_offset | size, qemu_ram_pagesize(rb)) || > + (ram_offset + size) > qemu_ram_get_used_length(rb)) { > + continue; > + } > + > + ram_block_discard_range(rb, ram_offset, size); > + } > + > +skip_element: > + virtqueue_push(vq, elem, 0); > + virtio_notify(vdev, vq); > + g_free(elem); > + } > +} > + > static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq) > { > VirtIOBalloon *s = VIRTIO_BALLOON(vdev); > @@ -782,6 +836,16 @@ static void virtio_balloon_device_realize(DeviceState > *dev, Error **errp) > VirtIOBalloon *s = VIRTIO_BALLOON(dev); > int ret; > > + /* > + * Page reporting is dependant on page poison to make sure we can > + * report a page without changing the state of the internal data. > + * We need to set the flag before we call virtio_init as it will > + * affect the config size of the vdev. > + */ > + if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_REPORTING)) { > + s->host_features |= 1 << VIRTIO_BALLOON_F_PAGE_POISON; > + } > + As discussed, this hunk would go away. With that, this patch is really minimal, which is good :) > virtio_init(vdev, "virtio-balloon", VIRTIO_ID_BALLOON, > virtio_balloon_config_size(s)); > > @@ -798,6 +862,10 @@ static void virtio_balloon_device_realize(DeviceState > *dev, Error **errp) > s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output); > s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats); > > + if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_REPORTING)) { > + s->rvq = virtio_add_queue(vdev, 32, virtio_balloon_handle_report); > + } > + > if (virtio_has_feature(s->host_features, > VIRTIO_BALLOON_F_FREE_PAGE_HINT)) { > s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE, > @@ -923,6 +991,8 @@ static Property virtio_balloon_properties[] = { > VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false), > DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features, > VIRTIO_BALLOON_F_FREE_PAGE_HINT, false), > + DEFINE_PROP_BIT("free-page-reporting", VirtIOBalloon, host_features, > + VIRTIO_BALLOON_F_REPORTING, true), I think you'll have to similarly disable it via compat machines if you want to default enable. Otherwise, backward migration would be broken. Also, I do wonder if we want to default-enable it. It can still have a negative performance impact and some people might not want that. Apart from that, looks good to me. Nothing else we have to migrate AFAIKs. -- Thanks, David / dhildenb