在 2019/7/10 11:57, Jason Wang 写道: > > On 2019/7/10 上午11:36, Longpeng (Mike) wrote: >> 在 2019/7/10 11:25, Jason Wang 写道: >>> On 2019/7/8 下午5:47, Dr. David Alan Gilbert wrote: >>>> * longpeng (longpe...@huawei.com) wrote: >>>>> Hi guys, >>>>> >>>>> We found a qemu core in our testing environment, the assertion >>>>> 'assert(bus->irq_count[i] == 0)' in pcibus_reset() was triggered and >>>>> the bus->irq_count[i] is '-1'. >>>>> >>>>> Through analysis, it was happened after VM migration and we think >>>>> it was caused by the following sequence: >>>>> >>>>> *Migration Source* >>>>> 1. save bus pci.0 state, including irq_count[x] ( =0 , old ) >>>>> 2. save E1000: >>>>> e1000_pre_save >>>>> e1000_mit_timer >>>>> set_interrupt_cause >>>>> pci_set_irq --> update pci_dev->irq_state to 1 and >>>>> update bus->irq_count[x] to 1 ( new ) >>>>> the irq_state sent to dest. >>>>> >>>>> *Migration Dest* >>>>> 1. Receive the irq_count[x] of pci.0 is 0 , but the irq_state of e1000 is >>>>> 1. >>>>> 2. If the e1000 need change irqline , it would call to pci_irq_handler(), >>>>> the irq_state maybe change to 0 and bus->irq_count[x] will become >>>>> -1 in this situation. >>>>> 3. do VM reboot then the assertion will be triggered. >>>>> >>>>> We also found some guys faced the similar problem: >>>>> [1] https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02525.html >>>>> [2] https://bugs.launchpad.net/qemu/+bug/1702621 >>>>> >>>>> Is there some patches to fix this problem ? >>>> I don't remember any. >>>> >>>>> Can we save pcibus state after all the pci devs are saved ? >>>> Does this problem only happen with e1000? I think so. >>>> If it's only e1000 I think we should fix it - I think once the VM is >>>> stopped for doing the device migration it shouldn't be raising >>>> interrupts. >>> >>> I wonder maybe we can simply fix this by no setting ICS on pre_save() but >>> scheduling mit timer unconditionally in post_load(). >>> >> I also think this is a bug of e1000 because we find more cores with the same >> frame thease days. >> >> I'm not familiar with e1000 so hope someone could fix it, thanks. :) >> > > Draft a path in attachment, please test. > Hi Jason,
We've tested the patch for about two weeks, everything went well, thanks! Feel free to add my: Reported-and-tested-by: Longpeng <longpe...@huawei.com> > Thanks > > >>> Thanks >>> >>> >>>> Dave >>>> >>>>> Thanks, >>>>> Longpeng(Mike) >>>> -- >>>> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK >>> . >>> -- Regards, Longpeng(Mike)