On Sat, Mar 3, 2018 at 8:04 PM, Michael S. Tsirkin <m...@redhat.com> wrote: > On Fri, Mar 02, 2018 at 03:56:31PM -0800, Siwei Liu wrote: >> On Fri, Mar 2, 2018 at 1:36 PM, Michael S. Tsirkin <m...@redhat.com> wrote: >> > On Fri, Mar 02, 2018 at 01:11:56PM -0800, Siwei Liu wrote: >> >> On Thu, Mar 1, 2018 at 12:08 PM, Sridhar Samudrala >> >> <sridhar.samudr...@intel.com> wrote: >> >> > This patch enables virtio_net to switch over to a VF datapath when a VF >> >> > netdev is present with the same MAC address. It allows live migration >> >> > of a VM with a direct attached VF without the need to setup a bond/team >> >> > between a VF and virtio net device in the guest. >> >> > >> >> > The hypervisor needs to enable only one datapath at any time so that >> >> > packets don't get looped back to the VM over the other datapath. When a >> >> > VF >> >> > is plugged, the virtio datapath link state can be marked as down. The >> >> > hypervisor needs to unplug the VF device from the guest on the source >> >> > host >> >> > and reset the MAC filter of the VF to initiate failover of datapath to >> >> > virtio before starting the migration. After the migration is completed, >> >> > the destination hypervisor sets the MAC filter on the VF and plugs it >> >> > back >> >> > to the guest to switch over to VF datapath. >> >> > >> >> > When BACKUP feature is enabled, an additional netdev(bypass netdev) is >> >> > created that acts as a master device and tracks the state of the 2 lower >> >> > netdevs. The original virtio_net netdev is marked as 'backup' netdev >> >> > and a >> >> > passthru device with the same MAC is registered as 'active' netdev. >> >> > >> >> > This patch is based on the discussion initiated by Jesse on this thread. >> >> > https://marc.info/?l=linux-virtualization&m=151189725224231&w=2 >> >> > >> >> > Signed-off-by: Sridhar Samudrala <sridhar.samudr...@intel.com> >> >> > Signed-off-by: Alexander Duyck <alexander.h.du...@intel.com> >> >> > Reviewed-by: Jesse Brandeburg <jesse.brandeb...@intel.com> >> >> > --- >> >> > drivers/net/virtio_net.c | 683 >> >> > ++++++++++++++++++++++++++++++++++++++++++++++- >> >> > 1 file changed, 682 insertions(+), 1 deletion(-) >> >> > >> >> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >> >> > index bcd13fe906ca..f2860d86c952 100644 >> >> > --- a/drivers/net/virtio_net.c >> >> > +++ b/drivers/net/virtio_net.c >> >> > @@ -30,6 +30,8 @@ >> >> > #include <linux/cpu.h> >> >> > #include <linux/average.h> >> >> > #include <linux/filter.h> >> >> > +#include <linux/netdevice.h> >> >> > +#include <linux/pci.h> >> >> > #include <net/route.h> >> >> > #include <net/xdp.h> >> >> > >> >> > @@ -206,6 +208,9 @@ struct virtnet_info { >> >> > u32 speed; >> >> > >> >> > unsigned long guest_offloads; >> >> > + >> >> > + /* upper netdev created when BACKUP feature enabled */ >> >> > + struct net_device *bypass_netdev; >> >> > }; >> >> > >> >> > struct padded_vnet_hdr { >> >> > @@ -2236,6 +2241,22 @@ static int virtnet_xdp(struct net_device *dev, >> >> > struct netdev_bpf *xdp) >> >> > } >> >> > } >> >> > >> >> > +static int virtnet_get_phys_port_name(struct net_device *dev, char >> >> > *buf, >> >> > + size_t len) >> >> > +{ >> >> > + struct virtnet_info *vi = netdev_priv(dev); >> >> > + int ret; >> >> > + >> >> > + if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_BACKUP)) >> >> > + return -EOPNOTSUPP; >> >> > + >> >> > + ret = snprintf(buf, len, "_bkup"); >> >> > + if (ret >= len) >> >> > + return -EOPNOTSUPP; >> >> > + >> >> > + return 0; >> >> > +} >> >> > + >> >> >> >> What if the systemd/udevd is not new enough to enforce the >> >> n<phys_port_name> naming? Would virtio_bypass get a different name >> >> than the original virtio_net? >> > >> > You mean people using ethX names? Any hardware config change breaks >> > these, I don't think that can be helped. >> >> I don't like the way to rely on .ndo_get_phys_port_name - it's fragile >> and it does not completely solve the problem it tries to address. >> Imagine what can end up with if getting an old udevd, or users already >> have exsiting explicit udev rules around phys_port_name. It does not >> give you the an ack in saying "yes, I know you're the bypass and >> you're the backup, please continue and I will give you both correct >> names", or an unacknowlegment saying "no, I don't know what these >> extra interfaces are, please go back and leave the VF device alone". >> We need new udev API for both feature negotiation and naming, or may >> even completely hide the lower interfaces. > > Go ahead and try to make this happen, but I won't hold my > breath.
Heads-up: I have some working code to hide the lower devices, upon which the new udev/sysfs interace for feature negotiation and naming will be built. I will get it posted for review in the next few days once ready. Hopefully this could end the fight between the 3-netdev and 2-netdev driver model, as I see most of the problems being argued in this thread can be addressed by just hiding the lower devices from the 3-netdev model while keeping its merits still. Regards, -Siwei > >> > >> >> Should we detect this earlier and fall >> >> back to legacy mode without creating the bypass netdev and ensalving >> >> the VF? >> > >> > I don't think we can do this with existing kernel/userspace APIs. >> >> That's why I ever said to make udev aware of this new type of combined >> device instead of doing hacks here and there around. >> >> Regards, >> -Siwei > > We can add new interfaces on top but the main purpose here is to > make old userspace do new tricks. > >> > >> > -- >> > MST