On 9/14/21 3:58 PM, Daniel P. Berrangé wrote:
> On Tue, Sep 14, 2021 at 02:30:06PM +0100, Dr. David Alan Gilbert wrote:
>> * David Hildenbrand (da...@redhat.com) wrote:
>>> On 14.09.21 15:17, Dr. David Alan Gilbert (git) wrote:
>>>> From: "Dr. David Alan Gilbert" <dgilb...@redhat.com>
>>>>
>>>> The subsection name for page-poison was typo'd as:
>>>>
>>>>    vitio-balloon-device/page-poison
>>>>
>>>> Note the missing 'r' in virtio.
>>>>
>>>> When we have a machine type that enables page poison, and the guest
>>>> enables it (which needs a new kernel), things fail rather unpredictably.
>>>>
>>>> The fallout from this is that most of the other subsections fail to
>>>> load, including things like the feature bits in the device, one
>>>> possible fallout is that the physical addresses of the queues
>>>> then get aligned differently and we fail with an error about
>>>> last_avail_idx being wrong.
>>>> It's not obvious to me why this doesn't produce a more obvious failure,
>>>> but virtio's vmstate loading is a bit open-coded.
>>>>
>>>> Fixes: 7483cbbaf82 ("virtio-balloon: Implement support for page poison 
>>>> reporting feature")
>>>> bz: https://bugzilla.redhat.com/show_bug.cgi?id=1984401
>>>> Signed-off-by: Dr. David Alan Gilbert <dgilb...@redhat.com>
>>>> ---
>>>>   hw/virtio/virtio-balloon.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
>>>> index 5a69dce35d..c6962fcbfe 100644
>>>> --- a/hw/virtio/virtio-balloon.c
>>>> +++ b/hw/virtio/virtio-balloon.c
>>>> @@ -852,7 +852,7 @@ static const VMStateDescription 
>>>> vmstate_virtio_balloon_free_page_hint = {
>>>>   };
>>>>   static const VMStateDescription vmstate_virtio_balloon_page_poison = {
>>>> -    .name = "vitio-balloon-device/page-poison",
>>>> +    .name = "virtio-balloon-device/page-poison",
>>>>       .version_id = 1,
>>>>       .minimum_version_id = 1,
>>>>       .needed = virtio_balloon_page_poison_support,
>>>>
>>>
>>> Oh, that's very subtle. I wasn't even aware that the prefix really has to
>>> match the actual device ... I thought the whole idea of the prefix here was
>>> just to make the string unique ...
>>
>> Subsection naming is *very* critical; the logic is something like:
>>   'we're loading the X device'
>> a subsection arrives for 'N/P'
>> if 'X==N' then it looks in X for subsection P.
>> If 'X!=N' then it assumes we've finished loading X
>> and P is really for an outer device that X is part of.
>> This is horrible.
> 
> Is there value in making this more explicit via a code convention
> for .name field initializers. eg instead of
> 
>    .name = "virtio-balloon-device/page-poison",
> 
> Prefer
> 
>    .name = TYPE_VIRTIO_BALLOON "/page-poison"
> 
> ?

IIUC so far only user-creatable devices are required to have
a stable name in the TYPE definition (because CLI must stay
stable). Which is why this type is not recommended for
migration section names.


Reply via email to