On Mon, Jun 11, 2018 at 8:22 AM, Stephen Hemminger <step...@networkplumber.org> wrote: > On Fri, 8 Jun 2018 17:42:21 -0700 > Siwei Liu <losewe...@gmail.com> wrote: > >> On Fri, Jun 8, 2018 at 5:02 PM, Stephen Hemminger >> <step...@networkplumber.org> wrote: >> > On Fri, 8 Jun 2018 16:44:12 -0700 >> > Siwei Liu <losewe...@gmail.com> wrote: >> > >> >> On Fri, Jun 8, 2018 at 4:18 PM, Stephen Hemminger >> >> <step...@networkplumber.org> wrote: >> >> > On Fri, 8 Jun 2018 15:25:59 -0700 >> >> > Siwei Liu <losewe...@gmail.com> wrote: >> >> > >> >> >> On Wed, Jun 6, 2018 at 2:24 PM, Stephen Hemminger >> >> >> <step...@networkplumber.org> wrote: >> >> >> > On Wed, 6 Jun 2018 15:30:27 +0300 >> >> >> > "Michael S. Tsirkin" <m...@redhat.com> wrote: >> >> >> > >> >> >> >> On Wed, Jun 06, 2018 at 09:25:12AM +0200, Jiri Pirko wrote: >> >> >> >> > Tue, Jun 05, 2018 at 05:42:31AM CEST, step...@networkplumber.org >> >> >> >> > wrote: >> >> >> >> > >The net failover should be a simple library, not a virtual >> >> >> >> > >object with function callbacks (see callback hell). >> >> >> >> > >> >> >> >> > Why just a library? It should do a common things. I think it >> >> >> >> > should be a >> >> >> >> > virtual object. Looks like your patch again splits the common >> >> >> >> > functionality into multiple drivers. That is kind of backwards >> >> >> >> > attitude. >> >> >> >> > I don't get it. We should rather focus on fixing the mess the >> >> >> >> > introduction of netvsc-bonding caused and switch netvsc to >> >> >> >> > 3-netdev >> >> >> >> > model. >> >> >> >> >> >> >> >> So it seems that at least one benefit for netvsc would be better >> >> >> >> handling of renames. >> >> >> >> >> >> >> >> Question is how can this change to 3-netdev happen? Stephen is >> >> >> >> concerned about risk of breaking some userspace. >> >> >> >> >> >> >> >> Stephen, this seems to be the usecase that IFF_HIDDEN was trying to >> >> >> >> address, and you said then "why not use existing network namespaces >> >> >> >> rather than inventing a new abstraction". So how about it then? Do >> >> >> >> you >> >> >> >> want to find a way to use namespaces to hide the PV device for >> >> >> >> netvsc >> >> >> >> compatibility? >> >> >> >> >> >> >> > >> >> >> > Netvsc can't work with 3 dev model. MS has worked with enough >> >> >> > distro's and >> >> >> > startups that all demand eth0 always be present. And VF may come and >> >> >> > go. >> >> >> > After this history, there is a strong motivation not to change how >> >> >> > kernel >> >> >> > behaves. Switching to 3 device model would be perceived as breaking >> >> >> > existing userspace. >> >> >> > >> >> >> > With virtio you can work it out with the distro's yourself. >> >> >> > There is no pre-existing semantics to deal with. >> >> >> > >> >> >> > For the virtio, I don't see the need for IFF_HIDDEN. >> >> >> >> >> >> I have a somewhat different view regarding IFF_HIDDEN. The purpose of >> >> >> that flag, as well as the 1-netdev model, is to have a means to >> >> >> inherit the interface name from the VF, and to eliminate playing hacks >> >> >> around renaming devices, customizing udev rules and et al. Why >> >> >> inheriting VF's name important? To allow existing config/setup around >> >> >> VF continues to work across kernel feature upgrade. Most of network >> >> >> config files in all distros are based on interface names. Few are MAC >> >> >> address based but making lower slaves hidden would cover the rest. And >> >> >> most importantly, preserving the same level of user experience as >> >> >> using raw VF interface once getting all ndo_ops and ethtool_ops >> >> >> exposed. This is essential to realize transparent live migration that >> >> >> users dont have to learn and be aware of the undertaken. >> >> > >> >> > Inheriting the VF name will fail in the migration scenario. >> >> > It is perfectly reasonable to migrate a guest to another machine where >> >> > the VF PCI address is different. And since current udev/systemd model >> >> > is to base network device name off of PCI address, the device will >> >> > change >> >> > name when guest is migrated. >> >> > >> >> The scenario of having VF on a different PCI address on post migration >> >> is essentially equal to plugging in a new NIC. Why it has to pair with >> >> the original PV? A sepearte PV device should be in place to pair the >> >> new VF. >> > >> > The host only guarantees that the PV device will be on the same network. >> > It does not make any PCI guarantees. The way Windows works is to find >> > the device based on "serial number" which is an Hyper-V specific attribute >> > of PCI devices. >> > >> > I considered naming off of serial number but that won't work for the >> > case where PV device is present first and VF arrives later. The serial >> > number is attribute of VF, not the PV which is there first. >> >> I assume the PV can get that information ahead of time before VF >> arrives? Without it how do you match the device when you see a VF >> coming with some serial number? Is it possible for PV to get the >> matching SN even earlier during probe time? Or it has to depend on the >> presence of vPCI bridge to generate this SN? > > > > NO. the PV device does not know ahead of time and there are scenario > where the serial and PCI info can change when it does arrive. These > are test cases (not something people usually do). Example on WS2016: > Guest configured with two or more vswitches and NICs. > SR-IOV is not enabled > > Later: > On Hyper-V console (or Powershell command line) on host SR-IOV > is enabled on the second NIC. > > The guest will be notified of new PCI device; the "serial number" > will be 1. > > If same process is repeated but in this case the first NIC has > SR-IOV enabled, it will get serial # 1. > > > I agree with Jakub. What you are proposing is backwards. The VF > must be thought of as a dependent of PV device not vice/versa.
I don't enforce netvsc moving to the same 1-netdev model, did I? I understand Hyper-V has its specific design that's hard to get around of. All I said transparent live migration and the 1-netdev model should work for the passthrough with virtio as helper under QEMU. As I recall the initial intent was to use virtio as a migration helper rather than having VF as acceleration path. The latter is as far as I know is from Hyper-V's point of view. I don't know where those side features come from and why doing live migration religiously is backwards. -Siwei > >> > >> > Your ideas about having the PCI information of the VF form the name >> > of the failover device have the same problem. The PV device may >> > be the only one present on boot. >> >> Yeah, this is a chicken-egg problem indeed, and that was the reason >> why I supply the BDF info for PV to name the master interface. >> However, the ACPI PCI slot needs to depend on the PCI bus enumeration >> so that can't be predictable. Would it make sense to only rename when >> the first time a matching VF appears and PV interface isn't brought >> up, then the failover master would always stick to the name >> afterwards? I think it should cover most scenarios as it's usually >> during boot time (dracut) the VF first appears and the PV interface at >> the time then shouldn't have been configured yet. >> >> -Siwei >> >> > >> > >> >> > On Azure, the VF maybe removed (by host) at any time and then later >> >> > reattached. There is no guarantee that VF will show back up at >> >> > the same synthetic PCI address. It will likely have a different >> >> > PCI domain value. >> >> >> >> This is something QEMU can do and make sure the PCI address is >> >> consistent after migration. >> >> >> >> -Siwei >> > >