> -----Original Message----- > From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com] > Sent: Monday, September 05, 2016 12:10 AM > To: Kyle Larose > Cc: dev at dpdk.org; huawei.xie at intel.com; jianfeng.tan at intel.com > Subject: Re: virtio kills qemu VM after stopping/starting ports > > On Thu, Sep 01, 2016 at 08:53:31PM +0000, Kyle Larose wrote: > > Hello everyone, > > Hi, > > Firstly, thanks for the report and detailed analysis! > > > > > In my own testing, I recently stumbled across an issue where I could get > > qemu > to exit when sending traffic to my application. To do this, I simply needed > to do > the following: > > > > 1) Start my virtio interfaces > > 2) Send some traffic into/out of the interfaces > > 3) Stop the interfaces > > 4) Start the interfaces > > 5) Send some more traffic > > > > At this point, I would lose connectivity to my VM. Further investigation > revealed qemu exiting with the following log: > > > > 2016-09-01T15:45:32.119059Z qemu-kvm: Guest moved used index > from 5 > > to 1 > > > > I found the following bug report against qemu, reported by a user of > > DPDK: https://bugs.launchpad.net/qemu/+bug/1558175 > > > > That thread seems to have stalled out, so I think we probably should deal > > with > the problem within DPDK itself. Either way, later in the bug report chain, we > see a link to this patch to DPDK: > http://dpdk.org/browse/dpdk/commit/?id=9a0615af774648. The submitter of > the bug report claims that this patch fixes the problem. Perhaps it does. > However, it introduces a new problem: If I remove the patch, I cannot > reproduce the problem. So, that leads me to believe that it has caused a > regression. > > Yes, it is a regression from that point of view. > > > To summarize the patch?s changes, it basically changes the virtio_dev_stop > function to flag the device as stopped, and stops the device when > closing/uninitializing it. However, there is a seemingly unintended > side-effect. > > > > In virtio_dev_start, we have the following block of code: > > > > /* On restart after stop do not touch queues */ > > if (hw->started) > > return 0; > > > > /* Do final configuration before rx/tx engine starts */ > > virtio_dev_rxtx_start(dev); > > > > ?. > > > > Prior to the patch, if an interface were stopped then started, without > restarting the application, the queues would be left as-is, because > hw->started > would be set to 1. Now, calling stop sets hw->started to 0, which means the > next call to start will ?touch the queues?. This is the unintended > side-effect that > causes the problem. > > > > I made a change locally to break the state of the device into two: started > > and > opened. The devices starts out neither started nor opened. If the device is > accepting packets, it is started. If the device has set up its queues, it is > opened. > Stopping the device does not close the device. This allows me to change the > check above to: > > > > if (hw->opened) { > > hw->started=1 > > return 0; > > } > > It would work in your case, but it makes thing complex. > > So, I talked with Jianfeng and revisited the original issue he meant to > fix: failure (maybe crash) on stop, re-configure queue number and restart. > > Yes, that case is broken, but the fix wasn't right, neither: we can't simply > re- > alloc and re-setup queue on start, because vhost is only aware of the first > setup. > You could check following link for more information, including the right fix > (you > need follow the discussion to find that). > > In summary, I will revert commit 9a0615af774 (and carry it to the stable > branch as well). Later, I will fix the virtio multiple queue issue. >
Alright, so we should probably reject my patch, then. :) http://dpdk.org/dev/patchwork/patch/15596/ > --yliu Thanks for getting back to me on this. Kyle