Thanks again Yuanhan, you are the true expert!! Rgds, Gopa.
On Thu, Mar 16, 2017 at 10:40 PM, Yuanhan Liu <yuanhan....@linux.intel.com> wrote: > On Thu, Mar 16, 2017 at 10:30:09PM -0700, Gopakumar Choorakkot Edakkunni > wrote: > > Thanks for the confirmation, glad I reached the person who knows the > nuts and > > bolts of virtio :-). So if the host is not in our control (ie if I am > just > > running as a VM on host provided by thirdparty vendor), is there any > workaround > > I can do from the guest side to prevent problems from happening on a > guest > > restart ? > > Not too much. You might want to hack the guest DPDK EAL memory initiation > part though, to not reset the hugepage memory on start. But that's too > hacky > that I will not recommend you to do so! > > > And if theres no workarounds at all and the host has to change, instead > of > > asking the third party vendor to do a wholesale upgrade to 16.04, is > there one/ > > few commits that can be added to the host ovs-dpdk to take care of this > guest > > restart virtio-reset-before opening case ? > > Yes, backporting the commits I have mentioned should be able to fix it. > But please note that I did some code refactorings before those fixes: it > won't apply cleanly to DPDK v2.2. > > And if you want to upgrade, I'd suggest to upgrade to v16.11, which is > LTS release. > > --yliu > > > > Rgds, > > Gopa. > > > > On Thu, Mar 16, 2017 at 10:24 PM, Yuanhan Liu < > yuanhan....@linux.intel.com> > > wrote: > > > > On Thu, Mar 16, 2017 at 10:20:30PM -0700, Gopakumar Choorakkot > Edakkunni > > wrote: > > > >> When I was saying dpdk version, I meant the DPDK version with > OVS. > > > > > > Oh I see! My apologies for the misuderstanding. The dpdk version > used by > > host > > > ovs should be dpdk2.2, the guest process uses dpdk16.07. The OVS > process > > is not > > > getting restarted, what is getting restarted is the guest process > using > > > dpdk16.07 - so the above clarifications you had about virtio being > > > reset-before-opened on guest restart - does that still hold good > or does > > that > > > need the HOST side dpdk to be 16.04 or above ? > > > > Yes, the HOST dpdk should be >= v16.04. > > > > --yliu > > > > > > >> And yes, the fixes are not included in the DPDK required for > OVS 2.4. > > > > > > Thanks for the info. > > > > > > Rgds, > > > Gopa. > > > > > > On Thu, Mar 16, 2017 at 10:13 PM, Yuanhan Liu < > > yuanhan....@linux.intel.com> > > > wrote: > > > > > > On Thu, Mar 16, 2017 at 09:56:01PM -0700, Gopakumar Choorakkot > > Edakkunni > > > wrote: > > > > Hi Yuanhan, > > > > > > > > Thanks for the confirmation about not having to do anything > special > > to > > > close > > > > the ports on dpdk going down or coming up. > > > > > > > > As for the question about if I met any issue of ovs getting > stuck - > > yes, > > > my > > > > guest process runs dpdk 16.07 as I mentioned earlier - and > if I > > kill my > > > guest > > > > process, then the host OVS-dpdk on the host reports stall ! > The > > OVS-dpdk > > > and > > > > emu versions I use are as below. But maybe that is because > of the > > ovs > > > missing > > > > the fixes you mentioned ? > > > > > > When I was saying dpdk version, I meant the DPDK version with > OVS. > > > > > > > ~# ovs-vswitchd --version > > > > ovs-vswitchd (Open vSwitch) 2.4.1 > > > > > > And yes, the fixes are not included in the DPDK required for > OVS 2.4. > > > > > > --yliu > > > > > > > Compiled Nov 14 2016 06:53:31 > > > > # kvm --version > > > > QEMU emulator version 2.2.0, Copyright (c) 2003-2008 Fabrice > > Bellard > > > > ~# > > > > > > > > > > > > Rgds, > > > > Gopa. > > > > > > > > On Thu, Mar 16, 2017 at 9:35 PM, Yuanhan Liu < > > yuanhan....@linux.intel.com > > > > > > > > wrote: > > > > > > > > On Thu, Mar 16, 2017 at 07:48:28PM -0700, Gopakumar > Choorakkot > > > Edakkunni > > > > wrote: > > > > > Thanks a lot for the response Yuanhan. I am using dpdk > > v16.07. So > > > what > > > > you are > > > > > saying is that in 16.07, we dont really need to call > > > rte_eth_dev_close() > > > > on > > > > > exit, > > > > > > > > It's not about "don't really need", it's more like "it's > hard > > to". > > > Just > > > > think that it may crash at any time. > > > > > > > > > because dpdk will ensure that it will do virtio reset > before > > init > > > when it > > > > > comes up right ? > > > > > > > > No, It just handles the abnormal case well when guest APP > > restarts. > > > > > > > > > Regarding the vhost commits you mentioned - do we > still need > > those > > > fixes > > > > if we > > > > > have the "virtio reset before init" mechanism ? > > > > > > > > Yes, we still need them: just think some malicious guest > may > > also > > > forge > > > > data like that. > > > > > > > > I'm a bit confused then. Have you actually met any issue > (like > > got > > > stucked) > > > > with DPDK v16.07? > > > > > > > > --yliu > > > > > > > > > Or that is a seperate problem > > > > > altogether (and hence we would need those fixes) ? > > > > > > > > > > Rgds, > > > > > Gopa. > > > > > > > > > > On Thu, Mar 16, 2017 at 7:06 PM, Yuanhan Liu < > > > yuanhan....@linux.intel.com > > > > > > > > > > wrote: > > > > > > > > > > On Thu, Mar 16, 2017 at 12:39:16PM -0700, Gopakumar > > Choorakkot > > > > Edakkunni > > > > > wrote: > > > > > > So the doc says we should call > rte_eth_dev_close() > > *before* > > > going > > > > down. > > > > > And I > > > > > > know that especially in dpdk-virtionet in the > guest + > > > ovs-dpdk in > > > > the > > > > > host, > > > > > > the ovs ends up getting stalled/stuck (!!) if I > dont > > close > > > the port > > > > > before > > > > > > starting() it when the guest dpdk process comes > back > > up. > > > > > > > > > > I'm assuming you were using an old version, > something > > like dpdk > > > v2.2? > > > > > IIRC, DPDK v16.04 should have fixed your issue. > > > > > > > > > > > Considering that this not done properly can > screw up > > the HOST > > > ovs, > > > > and I > > > > > want > > > > > > to do everything possible to avoid that, I want > to be > > 200% > > > sure > > > > that I > > > > > call > > > > > > close even if my process gets a kill -9 .. So > obviously > > the > > > only > > > > way of > > > > > doing > > > > > > that is to close the port when the dpdk process > comes > > back up > > > and > > > > > *before* we > > > > > > init the port. rte_eth_dev_close() is not > capable of > > doing > > > that as > > > > it > > > > > expects > > > > > > the port parameters to be initialized etc.. > before it > > can be > > > > called. > > > > > > > > > > We do virtio reset before init, which is basically > what > > > > rte_eth_dev_close() > > > > > mainly does. So I see no big issue here. > > > > > > > > > > The stuck issue is due to hugepage reset by the > guest > > DPDK > > > > application, > > > > > leading all virtio vring elements being mem > zeroed. The > > old > > > vhost > > > > doesn't > > > > > handle it well, as a result, it got stuck. And > here are > > some > > > relevant > > > > > commits: > > > > > > > > > > a436f53 vhost: avoid dead loop chain > > > > > c687b0b vhost: check for ring descriptors > overflow > > > > > 623bc47 vhost: do sanity check for ring > descriptor > > length > > > > > > > > > > --yliu > > > > > > > > > > > Any other > > > > > > suggestions on what can be done to close on > restart > > rather > > > than > > > > close on > > > > > going > > > > > > down ? Thought of bouncing this by the alias > before I > > add a > > > version > > > > of > > > > > close > > > > > > myself that can do this close-on-restart > > > > > > > > > > > > > > > > > > > > > > > > > > > > >