On 13.04.2016 19:07, Michael S. Tsirkin wrote: > On Wed, Apr 13, 2016 at 04:51:49PM +0200, Thomas Huth wrote: >> On 13.04.2016 15:15, Michael S. Tsirkin wrote: >>> On Wed, Apr 13, 2016 at 01:52:44PM +0200, Thomas Huth wrote: ... >>>> Then, there's yet another problem: If the host page size is bigger >>>> than the 4k balloon page size, we can not simply call madvise() on >>>> each of the 4k balloon addresses that we get from the guest - since >>>> the madvise() always evicts the whole host page, not only a 4k area! >>>> >>>> So in this case, we've got to track the 4k fragments of a host page >>>> and only call madvise(DONTNEED) when all fragments have been collected. >>>> This of course only works fine if the guest sends consecutive 4k >>>> fragments - which is the case in the most important scenarios that >>>> I try to address here (like a ppc64 guest with 64k page size that >>>> is running on a ppc64 host with 64k page size). In case the guest >>>> uses a page size that is smaller than the host page size, we might >>>> need to add some more additional logic here later to increase the >>>> probability of being able to release memory, but at least the guest >>>> should now not crash anymore due to unintentionally evicted pages. >> ... >>>> static void virtio_balloon_instance_init(Object *obj) >>>> diff --git a/include/hw/virtio/virtio-balloon.h >>>> b/include/hw/virtio/virtio-balloon.h >>>> index 35f62ac..04b7c0c 100644 >>>> --- a/include/hw/virtio/virtio-balloon.h >>>> +++ b/include/hw/virtio/virtio-balloon.h >>>> @@ -43,6 +43,9 @@ typedef struct VirtIOBalloon { >>>> int64_t stats_last_update; >>>> int64_t stats_poll_interval; >>>> uint32_t host_features; >>>> + void *current_addr; >>>> + unsigned long *fragment_bits; >>>> + int fragment_bits_size; >>>> } VirtIOBalloon; >>>> >>>> #endif >>> >>> It looks like fragment_bits would have to be migrated. >>> Which is a lot of complexity. ... >>> How about we just skip madvise if host page size is > balloon >>> page size, for 2.6? >> >> That would mean a regression compared to what we have today. Currently, >> the ballooning is working OK for 64k guests on a 64k ppc host - rather >> by chance than on purpose, but it's working. The guest is always sending >> all the 4k fragments of a 64k page, and QEMU is trying to call madvise() >> for every one of them, but the kernel is ignoring madvise() on >> non-64k-aligned addresses, so we end up with a situation where the >> madvise() frees a whole 64k page which is also declared as free by the >> guest. >> >> I think we should either take this patch as it is right now (without >> adding extra code for migration) and later update it to the bitmap code >> by Jitendra Kolhe, or omit it completely (leaving 4k guests broken) and >> fix it properly after the bitmap code has been applied. But disabling >> the balloon code for 64k guests on 64k hosts completely does not sound >> very appealing to me. What do you think? >> >> Thomas > > True. As simple a hack - how about disabling madvise when host page size > > target page size?
That could work - but is there a generic way in QEMU to get the current page size from a guest (since this might differ from TARGET_PAGE_SIZE)? Or would that mean to pollute the virtio-balloon code with ugly #ifdefs? Thomas