Updates.

Reverted the commit
https://github.com/openvswitch/ovs/commit/ad550ebc36323bac92df8bf31d7527cf5282b731
for ovs-2.13.11. It does not work. VM ports and the vhu socks are
still disconnected.

And more, I also reverted the commits:
https://github.com/openvswitch/ovs/commit/b837d1fdc46097b1583adf9dd920876d68dbae36
https://github.com/openvswitch/ovs/commit/6f86b1054966a5ebfca6049031d3fc769200af82
Not work again.

So I guess maybe the commit:
"userspace: Add TCP Segmentation Offload support"
https://github.com/openvswitch/ovs/commit/29cf9c1b3b9c4574df4f579c74c4e6d9ebb6d279
should be considered to revert as well.

What do you think?

Regards,
LIU Yulong

On Wed, Oct 16, 2024 at 9:42 PM LIU Yulong <liuyulong...@gmail.com> wrote:
>
> Thank you Ilya.
>
> I've tried to revert the commit
> https://github.com/openvswitch/ovs/commit/514950d37dabebbdfa40ddf87596a7293de2d87c
> in the ovs-2.13.11.
> https://github.com/openvswitch/ovs/commits/v2.13.11/lib/netdev-dpdk.c
> https://github.com/openvswitch/ovs/commit/ad550ebc36323bac92df8bf31d7527cf5282b731
> Currently it seems the VMs ports are all working after upgrading. I
> will try to upgrade and downgrade more times to see if it is stable.
> And of course we will also test the live migration and other actions
> of the guest VM.
> I think this may be a solution we can accept at present, at least to
> keep the existing virtual machines as they are, regardless of whether
> their negotiated features are actually supported by the backend.
>
> The reason why a higher version of OVS cannot be used is because of
> the kernel version and the compatibility of some drivers on our OS.
> And since ovs 2.14, the OVS compilation method is also not well
> supported in our system.
> But We also have some new systems that are already using the LTS
> version of OVS 2.17.
>
> Regards,
> LIU Yulong
>
> On Wed, Oct 16, 2024 at 8:27 PM Ilya Maximets <i.maxim...@ovn.org> wrote:
> >
> > On 10/15/24 11:13, LIU Yulong wrote:
> > > Hi community and experts,
> > >
> > > We have recently attempted to upgrade OVS 2.12+DPDK 18.11 to OVS
> > > 2.13.11+DPDK 19.11.14. And then we encountered a state where some
> > > virtual machine network cards are down, and users were not able to
> > > start the network cards inside the guest VM.
> > > After investigating, we found that qemu reported errors (many many
> > > times) , which means virtIO feature negotiation failed:
> > > 2024-10-15T06:25:16.986398Z qemu-kvm: failed to init vhost_net for queue 0
> > > vhost lacks feature mask 16384 for backend
> > >
> > > Which means the backend of virtIO, aka vhostuser,  does not support
> > > 16384 (the 14th in feature bits).
> > > Source code definition bit:
> > > #define  VIRTIO_NET_F_HOST_UFO   14 /* Host can handle UFO in. */
> > >
> > > In the same host, if the HOST_UFO bit of some virtual machines is set
> > > to 1, the network card cannot start. While some are 0, it can be
> > > started.
> > >
> > > We found some useful series of links:
> > > https://mail.openvswitch.org/pipermail/ovs-dev/2023-June/405829.html
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1845488#c5
> > >
> > > The conclusion seems to be that such hot upgrade is impossible to
> > > achieve. If the guest VM is not restarted or the network card is not
> > > redo hot unplug and plug, the user's network card will not be able to
> > > work properly. This situation is unacceptable for a cloud environment
> > > because we cannot require all user VMs to be restarted.
> > >
> > > Therefore, I'm asking here if there is a possible work around to
> > > achieve such an upgrade?
> >
> > Hi, unfortunately, I don't think there is a way forward that doesn't involve
> > cold migration / restart / port hot-replug.
> >
> > The issue is that at some point we accidentally exposed UFO and a few other
> > features for negotiation due to compound of different factors.  Ideally,
> > those features would not be acked / negotiated, because we did not advertise
> > prerequisite features.  However, AFAIU, none of virtio/vhost-net 
> > implementation
> > parts including DPDK, QEMU and the kernel actually comply with virtio-net 
> > spec
> > and accept feature flags for which dependencies are not satisfied.  So, 
> > these
> > features end up acked by QEMU and the guest driver even if they are not 
> > allowed
> > to use them.  Unfortunately for us that means that if we do the right thing 
> > and
> > turn these features off on OVS side, we will not be able to connect to QEMU
> > that did already expose these features to the guest.
> >
> > As I mentioned, at some point we did expose UFO to the guest by mistake.
> > Then it was fixed by the following commit:
> >   
> > https://github.com/openvswitch/ovs/commit/514950d37dabebbdfa40ddf87596a7293de2d87c
> > You may see that this patch also makes the wrong assumption for TSO case 
> > that
> > disabling checksum offload will end up with TSO/UFO not being enabled.  
> > Later
> > it was fixed + worked around while trying to figure out enabling checksum
> > offload by default, but we still can't really work around unsupported ECN.
> > At least, nobody seem to use ECN, so that wasn't a huge problem so far.
> >
> > Unfortunately again, the fact that commit 514950d37dab breaks live migration
> > and upgrades was discovered too late and reverting this commit wasn't an 
> > option.
> > Also because reverting it would mean that we would start advertising 
> > incorrect
> > features again, which is not good.
> >
> > The only way to make your VMs work without restarting / re-plugging is to
> > remove VIRTIO_NET_F_HOST_UFO from the vhost_unsup_flags.  But once you do 
> > that,
> > you'll have to keep that broken workaround literally forever, as all the 
> > newly
> > started VMs will have it negotiated and hence will have the same problem.
> >
> > This will also become a big problem once you go to OVS 3.2+ where checksum
> > offload is enabled by default, so your negotiated UFO will now be allowed to
> > be used by the guest and that will break OVS, because we do not support UFO
> > on OVS side and, unlike ECN, we can't really ignore it.
> >
> > The best available solution, I think, is to plan the upgrade and gradually
> > cold-migrate (not live) VMs from nodes with old OVS to nodes with upgraded 
> > one.
> > I'd also suggest to migrate to some supported version of OVS instead of 
> > 2.13.
> > OVS 3.3 LTS might be a good choice.
> >
> > FWIW, while upgrade from pre-2.13 to post-2.13 is not possible without 
> > restart,
> > upgrades from 2.13+ forward should not have such issues.
> >
> > I had an idea that the issue could be solved by QEMU not acking features 
> > that
> > do not have satisfied dependencies and clearing features with not satisfied
> > dependencies from the acked feature set during live migration.  Since the 
> > guest
> > is not allowed to use those anyway, it should not cause problems.  And if 
> > the
> > guest will re-negotiate it will receive an updated feature set without those
> > non-satisfied dependencies and we can move on with our lives...  But this
> > requires a lot of considerations and discussion with QEMU / virtio 
> > maintainers.
> > I'll start the thread on qemu-devel to check if there are issues with such
> > a solution or if it is even possible or acceptable.  Either way, such a 
> > change
> > will unlikely be backported to older versions of QEMU.
> >
> > Best regards, Ilya Maximets.
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to