balloon: Fixes for different host page sizes

Thomas Huth Wed, 13 Apr 2016 11:13:04 -0700

On 13.04.2016 19:55, Michael S. Tsirkin wrote:
> On Wed, Apr 13, 2016 at 07:38:12PM +0200, Thomas Huth wrote:
>> On 13.04.2016 19:07, Michael S. Tsirkin wrote:
>>> On Wed, Apr 13, 2016 at 04:51:49PM +0200, Thomas Huth wrote:
>>>> On 13.04.2016 15:15, Michael S. Tsirkin wrote:
>>>>> On Wed, Apr 13, 2016 at 01:52:44PM +0200, Thomas Huth wrote:
>> ...
>>>>>> Then, there's yet another problem: If the host page size is bigger
>>>>>> than the 4k balloon page size, we can not simply call madvise() on
>>>>>> each of the 4k balloon addresses that we get from the guest - since
>>>>>> the madvise() always evicts the whole host page, not only a 4k area!
>>>>>>
>>>>>> So in this case, we've got to track the 4k fragments of a host page
>>>>>> and only call madvise(DONTNEED) when all fragments have been collected.
>>>>>> This of course only works fine if the guest sends consecutive 4k
>>>>>> fragments - which is the case in the most important scenarios that
>>>>>> I try to address here (like a ppc64 guest with 64k page size that
>>>>>> is running on a ppc64 host with 64k page size). In case the guest
>>>>>> uses a page size that is smaller than the host page size, we might
>>>>>> need to add some more additional logic here later to increase the
>>>>>> probability of being able to release memory, but at least the guest
>>>>>> should now not crash anymore due to unintentionally evicted pages.
>>>> ...
>>>>>>  static void virtio_balloon_instance_init(Object *obj)
>>>>>> diff --git a/include/hw/virtio/virtio-balloon.h 
>>>>>> b/include/hw/virtio/virtio-balloon.h
>>>>>> index 35f62ac..04b7c0c 100644
>>>>>> --- a/include/hw/virtio/virtio-balloon.h
>>>>>> +++ b/include/hw/virtio/virtio-balloon.h
>>>>>> @@ -43,6 +43,9 @@ typedef struct VirtIOBalloon {
>>>>>>      int64_t stats_last_update;
>>>>>>      int64_t stats_poll_interval;
>>>>>>      uint32_t host_features;
>>>>>> +    void *current_addr;
>>>>>> +    unsigned long *fragment_bits;
>>>>>> +    int fragment_bits_size;
>>>>>>  } VirtIOBalloon;
>>>>>>  
>>>>>>  #endif
>>>>>
>>>>> It looks like fragment_bits would have to be migrated.
>>>>> Which is a lot of complexity.
>> ...
>>>>> How about we just skip madvise if host page size is > balloon
>>>>> page size, for 2.6?
>>>>
>>>> That would mean a regression compared to what we have today. Currently,
>>>> the ballooning is working OK for 64k guests on a 64k ppc host - rather
>>>> by chance than on purpose, but it's working. The guest is always sending
>>>> all the 4k fragments of a 64k page, and QEMU is trying to call madvise()
>>>> for every one of them, but the kernel is ignoring madvise() on
>>>> non-64k-aligned addresses, so we end up with a situation where the
>>>> madvise() frees a whole 64k page which is also declared as free by the
>>>> guest.
>>>>
>>>> I think we should either take this patch as it is right now (without
>>>> adding extra code for migration) and later update it to the bitmap code
>>>> by Jitendra Kolhe, or omit it completely (leaving 4k guests broken) and
>>>> fix it properly after the bitmap code has been applied. But disabling
>>>> the balloon code for 64k guests on 64k hosts completely does not sound
>>>> very appealing to me. What do you think?
>>>
>>> True. As simple a hack - how about disabling madvise when host page size >
>>> target page size?
>>
>> That could work - but is there a generic way in QEMU to get the current
>> page size from a guest (since this might differ from TARGET_PAGE_SIZE)?
>> Or would that mean to pollute the virtio-balloon code with ugly #ifdefs?
> 
> let's just use TARGET_PAGE_SIZE, that's the best I can think of.


That won't work - at least not on ppc: TARGET_PAGE_SIZE is always
defined to 4096 here. The Linux kernel then switches the real page size
during runtime to 65536. So we'd need a way to detect this automatically...

 Thomas

Re: [Qemu-devel] [PATCH] hw/virtio/balloon: Fixes for different host page sizes

Reply via email to