On Wed, Apr 13, 2016 at 08:11:36PM +0200, Thomas Huth wrote: > On 13.04.2016 19:55, Michael S. Tsirkin wrote: > > On Wed, Apr 13, 2016 at 07:38:12PM +0200, Thomas Huth wrote: > >> On 13.04.2016 19:07, Michael S. Tsirkin wrote: > >>> On Wed, Apr 13, 2016 at 04:51:49PM +0200, Thomas Huth wrote: > >>>> On 13.04.2016 15:15, Michael S. Tsirkin wrote: > >>>>> On Wed, Apr 13, 2016 at 01:52:44PM +0200, Thomas Huth wrote: > >> ... > >>>>>> Then, there's yet another problem: If the host page size is bigger > >>>>>> than the 4k balloon page size, we can not simply call madvise() on > >>>>>> each of the 4k balloon addresses that we get from the guest - since > >>>>>> the madvise() always evicts the whole host page, not only a 4k area! > >>>>>> > >>>>>> So in this case, we've got to track the 4k fragments of a host page > >>>>>> and only call madvise(DONTNEED) when all fragments have been collected. > >>>>>> This of course only works fine if the guest sends consecutive 4k > >>>>>> fragments - which is the case in the most important scenarios that > >>>>>> I try to address here (like a ppc64 guest with 64k page size that > >>>>>> is running on a ppc64 host with 64k page size). In case the guest > >>>>>> uses a page size that is smaller than the host page size, we might > >>>>>> need to add some more additional logic here later to increase the > >>>>>> probability of being able to release memory, but at least the guest > >>>>>> should now not crash anymore due to unintentionally evicted pages. > >>>> ... > >>>>>> static void virtio_balloon_instance_init(Object *obj) > >>>>>> diff --git a/include/hw/virtio/virtio-balloon.h > >>>>>> b/include/hw/virtio/virtio-balloon.h > >>>>>> index 35f62ac..04b7c0c 100644 > >>>>>> --- a/include/hw/virtio/virtio-balloon.h > >>>>>> +++ b/include/hw/virtio/virtio-balloon.h > >>>>>> @@ -43,6 +43,9 @@ typedef struct VirtIOBalloon { > >>>>>> int64_t stats_last_update; > >>>>>> int64_t stats_poll_interval; > >>>>>> uint32_t host_features; > >>>>>> + void *current_addr; > >>>>>> + unsigned long *fragment_bits; > >>>>>> + int fragment_bits_size; > >>>>>> } VirtIOBalloon; > >>>>>> > >>>>>> #endif > >>>>> > >>>>> It looks like fragment_bits would have to be migrated. > >>>>> Which is a lot of complexity. > >> ... > >>>>> How about we just skip madvise if host page size is > balloon > >>>>> page size, for 2.6? > >>>> > >>>> That would mean a regression compared to what we have today. Currently, > >>>> the ballooning is working OK for 64k guests on a 64k ppc host - rather > >>>> by chance than on purpose, but it's working. The guest is always sending > >>>> all the 4k fragments of a 64k page, and QEMU is trying to call madvise() > >>>> for every one of them, but the kernel is ignoring madvise() on > >>>> non-64k-aligned addresses, so we end up with a situation where the > >>>> madvise() frees a whole 64k page which is also declared as free by the > >>>> guest. > >>>> > >>>> I think we should either take this patch as it is right now (without > >>>> adding extra code for migration) and later update it to the bitmap code > >>>> by Jitendra Kolhe, or omit it completely (leaving 4k guests broken) and > >>>> fix it properly after the bitmap code has been applied. But disabling > >>>> the balloon code for 64k guests on 64k hosts completely does not sound > >>>> very appealing to me. What do you think? > >>> > >>> True. As simple a hack - how about disabling madvise when host page size > > >>> target page size? > >> > >> That could work - but is there a generic way in QEMU to get the current > >> page size from a guest (since this might differ from TARGET_PAGE_SIZE)? > >> Or would that mean to pollute the virtio-balloon code with ugly #ifdefs? > > > > let's just use TARGET_PAGE_SIZE, that's the best I can think of. > > That won't work - at least not on ppc: TARGET_PAGE_SIZE is always > defined to 4096 here. The Linux kernel then switches the real page size > during runtime to 65536. So we'd need a way to detect this automatically...
Won't work for ARM either. TARGET_PAGE_SIZE is always 1K. The host will always have at least 4K, but the guest could have as much as 64K. There's no non-hacky way (that I know of) to determine what it is. Thanks, drew