On Fri, Aug 16, 2019 at 05:07:10PM -0400, Yash Mankad wrote: > On 8/16/19 1:42 PM, Eduardo Habkost wrote: > > On Fri, Aug 16, 2019 at 02:22:58PM +0200, Markus Armbruster wrote: > >> Erik Skultety <eskul...@redhat.com> writes: > >> > >>> On Fri, Aug 16, 2019 at 08:10:20AM +0200, Markus Armbruster wrote: > >>>> Eduardo Habkost <ehabk...@redhat.com> writes: > >>>> > >>>>> We have this issue reported when using libvirt to hotplug CPUs: > >>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1741451 > >>>>> > >>>>> Basically, libvirt is not copying die-id from > >>>>> query-hotpluggable-cpus, but die-id is now mandatory. > >>>> Uh-oh, "is now mandatory": making an optional property mandatory is an > >>>> incompatible change. When did we do that? Commit hash, please. > >>>> > >>>> [...] > >>>> > >>> I don't even see it as being optional ever - the property wasn't even > >>> recognized before commit 176d2cda0de introduced it as mandatory. > >> Compatibility break. > >> > >> Commit 176d2cda0de is in v4.1.0. If I had learned about it a bit > >> earlier, I would've argued for a last minute fix or a revert. Now we > >> have a regression in the release. > >> > >> Eduardo, I think this fix should go into v4.1.1. Please add cc: > >> qemu-stable. > > I did it in v2. > > > >> How can we best avoid such compatibility breaks to slip in undetected? > >> > >> A static checker would be nice. For vmstate, we have > >> scripts/vmstate-static-checker.py. Not sure it's used. > > I don't think this specific bug would be detected with a static > > checker. "die-id is mandatory" is not something that can be > > extracted by looking at QOM data structures. The new rule was > > being enforced by the hotplug handler callbacks, and the hotplug > > handler call tree is a bit complex (too complex for my taste, but > > I digress). > > > > We could have detected this with a simple CPU hotplug automated > > test case, though. Or with a very simple -device test case like > > the one I have submitted with this patch. > > > > This was detected by libvirt automated test cases. It would be > > nice if this was run during the -rc stage and not only after the > > 4.1.0 release, though. > > > > I don't know details of the test job. Danilo, Mirek, Yash: do > > you know how this bug was detected, and what we could do to run > > the same test jobs in upstream QEMU release candidates? > > This bug was caught by our internal gating tests. > > The libvirt gating tests for the virt module include the > following Avocado-VT test case: > > libvirt_vcpu_plug_unplug.positive_test.vcpu_set.live.vm_operate.save > > This job failed with the error that you can see in the description > of the BZ#1741451 [0]. > > If you think that this would have been caught by a simple hotplug > case, I'd recommend adding a test for hotplug to avocado_qemu. > Otherwise, if you want, I can look into adding this particular > libvirt test case to our QEMU CI efforts.
Having a hotplug test case using avocado_qemu would help catch some bugs, but it's not enough if we want to catch integration issues between QEMU and libvirt (like this one). I wouldn't focus just on that particular libvirt test case. I suggest we run a reasonably large set of libvirt test cases against QEMU release candidates as soon as they are tagged. > > Thanks, > Yash > > [0] https://bugzilla.redhat.com/show_bug.cgi?id=1741451#c0 > -- Eduardo