Hi, > -----Original Message----- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Tan, Jianfeng > Sent: Friday, September 2, 2016 8:54 PM > To: Kyle Larose; dev at dpdk.org > Cc: Xie, Huawei; yuanhan.liu at linux.intel.com > Subject: Re: [dpdk-dev] virtio kills qemu VM after stopping/starting ports > > Hi Kyle, > > > On 9/2/2016 8:35 PM, Kyle Larose wrote: > > > >> -----Original Message----- > >> From: Tan, Jianfeng [mailto:jianfeng.tan at intel.com] > >> Sent: Friday, September 02, 2016 2:56 AM > >> To: Kyle Larose; dev at dpdk.org > >> Cc: huawei.xie at intel.com; yuanhan.liu at linux.intel.com > >> Subject: Re: virtio kills qemu VM after stopping/starting ports > >> > >> Hi Kyle, > >> > >> > >> On 9/2/2016 4:53 AM, Kyle Larose wrote: > >>> Hello everyone, > >>> > >>> In my own testing, I recently stumbled across an issue where I could get > qemu > >> to exit when sending traffic to my application. To do this, I simply needed > to do > >> the following: > >>> 1) Start my virtio interfaces > >>> 2) Send some traffic into/out of the interfaces > >>> 3) Stop the interfaces > >>> 4) Start the interfaces > >>> 5) Send some more traffic > >>> > >>> At this point, I would lose connectivity to my VM. Further investigation > >> revealed qemu exiting with the following log: > >>> 2016-09-01T15:45:32.119059Z qemu-kvm: Guest moved used index > >> from 5 > >>> to 1 > >>> > >>> I found the following bug report against qemu, reported by a user of > >>> DPDK: https://bugs.launchpad.net/qemu/+bug/1558175 > >>> > >>> That thread seems to have stalled out, so I think we probably should > deal with > >> the problem within DPDK itself. Either way, later in the bug report chain, > we > >> see a link to this patch to DPDK: > >> http://dpdk.org/browse/dpdk/commit/?id=9a0615af774648. The > submitter of > >> the bug report claims that this patch fixes the problem. Perhaps it does. > >> > >> I once got a chance to analyze the bug you referred here. The patch > >> (http://dpdk.org/browse/dpdk/commit/?id=9a0615af774648) does not fix > that > >> bug. The root cause of that bug is: when DPDK appilcation gets killed, > nobody > >> tells the vhost backend to stop. So when restarting the DPDK app, those > >> hugepages are reused, and initialized to all zero. And unavoidably, "idx" > >> in > >> those memory is changed to 0. I have written a patch to notify the vhost > >> backend to stop when DPDK app gets killed suddenly (more accurate, > when > >> /dev/uioX gets closed), and this patch will be sent out recently. And this > patch > >> does not fix your problem, either. You did not kill the app. (I should not > update > >> info about that bug here, and I mean they are different problems) > >> > >>> However, it introduces a new problem: If I remove the patch, I cannot > >> reproduce the problem. So, that leads me to believe that it has caused a > >> regression. > >>> To summarize the patch?s changes, it basically changes the > virtio_dev_stop > >> function to flag the device as stopped, and stops the device when > >> closing/uninitializing it. However, there is a seemingly unintended side- > effect. > >>> In virtio_dev_start, we have the following block of code: > >>> > >>> /* On restart after stop do not touch queues */ > >>> if (hw->started) > >>> return 0; > >>> > >>> /* Do final configuration before rx/tx engine starts */ > >>> virtio_dev_rxtx_start(dev); > >>> > >>> ?. > >>> > >>> Prior to the patch, if an interface were stopped then started, without > >> restarting the application, the queues would be left as-is, because hw- > >started > >> would be set to 1. > >> > >> Yes, my previous patch did break this behavior (by stopping and re- > starting the > >> device, the queues would be left as-is) and leads to the problem here. > And you > >> are proposing to recover. > >> > >> But is this a right behavior to follow? > >> On the one side, when we stop the virtio device, should we notify the > vhost > >> backend to stop too? Currently, we just flag it as stopped. > >> On the other side, now in many PMDs, we mix the dev > >> initialization/configuration code into dev_start functions, that is to day, > we re- > >> configure the device every time we start it (may be to make sure that > changed > >> configuration will be configured). Then what if we increase the queue > numbers > >> of virtio? > > Hi Jianfeng, > > > > > > Let me see if my understanding is correct. You're saying that there is a > problem introduced by your patch, but going back to the original behaviour is > not ideal: it still leaves a gap with respect to changing the number of queues > with virtio. The ideal solution would have us properly tear down the device > when it is stopped, and then properly reinitialize it when it is started? That > seems reasonable. However, I'm not sure what it would take. > > Yes, exactly. > > > Does it make sense to submit a patch like I suggested in the short term, and > have somebody work on a more robust long term solution? How could we go > about making sure that happens? (I can't allocate much time to fixing this, so > I likely won't be able to do anything more complicated than what I've > proposed). > > Let's see if maintainers have any suggestions here. (And I personally do > not oppose your proposal for short term).
To add more info: The previous patch 9a0615af774 was to fix the problem which is described here: http://dpdk.org/ml/archives/dev/2015-December/030911.html And it's about reconfiguring queue number after stopping the device and then restart the device. Thanks, Jianfeng > > Thanks, > Jianfeng > >