Just one comment: The devices allocated for an instance are immediately known after the domain is created. Therefore it's possible to do a port update and have the device configured while the instance is booting.
--Robert On 1/19/14 2:15 AM, "Irena Berezovsky" <ire...@mellanox.com> wrote: >Hi Robert, Yonhong, >Although network XML solution (option 1) is very elegant, it has one >major disadvantage. As Robert mentioned, the disadvantage of the network >XML is the inability to know what SR-IOV PCI device was actually >allocated. When neutron is responsible to set networking configuration, >manage admin status, set security groups, it should be able to identify >the SR-IOV PCI device to apply configuration. Within current libvirt >Network XML implementation, it does not seem possible. >Between option (2) and (3), I do not have any preference, it should be as >simple as possible. >Option (3) that I raised can be achieved by renaming the network >interface of Virtual Function via 'ip link set name'. Interface logical >name can be based on neutron port UUID. This will allow neutron to >discover devices, if backend plugin requires it. Once VM is migrating, >suitable Virtual Function on the target node should be allocated, and >then its corresponding network interface should be renamed to same >logical name. This can be done without system rebooting. Still need to >check how the Virtual Function corresponding network interface can be >returned to its original name once is not used anymore as VM vNIC. > >Regards, >Irena > >-----Original Message----- >From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com] >Sent: Friday, January 17, 2014 9:06 PM >To: OpenStack Development Mailing List (not for usage questions) >Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network >support > >Robert, thanks for your long reply. Personally I'd prefer option 2/3 as >it keep Nova the only entity for PCI management. > >Glad you are ok with Ian's proposal and we have solution to resolve the >libvirt network scenario in that framework. > >Thanks >--jyh > >> -----Original Message----- >> From: Robert Li (baoli) [mailto:ba...@cisco.com] >> Sent: Friday, January 17, 2014 7:08 AM >> To: OpenStack Development Mailing List (not for usage questions) >> Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network >> support >> >> Yunhong, >> >> Thank you for bringing that up on the live migration support. In >> addition to the two solutions you mentioned, Irena has a different >> solution. Let me put all the them here again: >> 1. network xml/group based solution. >> In this solution, each host that supports a provider >> net/physical net can define a SRIOV group (it's hard to avoid the term >> as you can see from the suggestion you made based on the PCI flavor >> proposal). For each SRIOV group supported on a compute node, A network >> XML will be created the first time the nova compute service is running >> on that node. >> * nova will conduct scheduling, but not PCI device allocation >> * it's a simple and clean solution, documented in libvirt as >> the way to support live migration with SRIOV. In addition, a network >> xml is nicely mapped into a provider net. >> 2. network xml per PCI device based solution >> This is the solution you brought up in this email, and Ian >> mentioned this to me as well. In this solution, a network xml is >> created when A VM is created. the network xml needs to be removed once >> the VM is removed. This hasn't been tried out as far as I know. >> 3. interface xml/interface rename based solution >> Irena brought this up. In this solution, the ethernet interface >> name corresponding to the PCI device attached to the VM needs to be >> renamed. One way to do so without requiring system reboot is to change >> the udev rule's file for interface renaming, followed by a udev >> reload. >> >> Now, with the first solution, Nova doesn't seem to have control over >> or visibility of the PCI device allocated for the VM before the VM is >> launched. This needs to be confirmed with the libvirt support and see >> if such capability can be provided. This may be a potential drawback >> if a neutron plugin requires detailed PCI device information for >>operation. >> Irena may provide more insight into this. Ideally, neutron shouldn't >> need this information because the device configuration can be done by >> libvirt invoking the PCI device driver. >> >> The other two solutions are similar. For example, you can view the >> second solution as one way to rename an interface, or camouflage an >> interface under a network name. They all require additional works >> before the VM is created and after the VM is removed. >> >> I also agree with you that we should take a look at XenAPI on this. >> >> >> With regard to your suggestion on how to implement the first solution >> with some predefined group attribute, I think it definitely can be >> done. As I have pointed it out earlier, the PCI flavor proposal is >> actually a generalized version of the PCI group. In other words, in >> the PCI group proposal, we have one predefined attribute called PCI >> group, and everything else works on top of that. In the PCI flavor >> proposal, attribute is arbitrary. So certainly we can define a >> particular attribute for networking, which let's temporarily call >> sriov_group. But I can see with this idea of predefined attributes, >> more of them will be required by different types of devices in the >> future. I'm sure it will keep us busy although I'm not sure it's in a >>good way. >> >> I was expecting you or someone else can provide a practical deployment >> scenario that would justify the flexibilities and the complexities. >> Although I'd prefer to keep it simple and generalize it later once a >> particular requirement is clearly identified, I'm fine to go with it >> if that's most of the folks want to do. >> >> --Robert >> >> >> >> On 1/16/14 8:36 PM, "yunhong jiang" <yunhong.ji...@linux.intel.com> >> wrote: >> >> >On Thu, 2014-01-16 at 01:28 +0100, Ian Wells wrote: >> >> To clarify a couple of Robert's points, since we had a conversation >> >> earlier: >> >> On 15 January 2014 23:47, Robert Li (baoli) <ba...@cisco.com> wrote: >> >> --- do we agree that BDF address (or device id, whatever >> >> you call it), and node id shouldn't be used as attributes in >> >> defining a PCI flavor? >> >> >> >> >> >> Note that the current spec doesn't actually exclude it as an option. >> >> It's just an unwise thing to do. In theory, you could elect to >> >> define your flavors using the BDF attribute but determining 'the >> >> card in this slot is equivalent to all the other cards in the same >> >> slot in other machines' is probably not the best idea... We could >> >> lock it out as an option or we could just assume that >> >> administrators wouldn't be daft enough to try. >> >> >> >> >> >> * the compute node needs to know the PCI flavor. >> >> [...] >> >> - to support live migration, we need to >> use >> >> it to create network xml >> >> >> >> >> >> I didn't understand this at first and it took me a while to get >> >> what Robert meant here. >> >> >> >> This is based on Robert's current code for macvtap based live >> >> migration. The issue is that if you wish to migrate a VM and it's >> >> tied to a physical interface, you can't guarantee that the same >> >> physical interface is going to be used on the target machine, but >> >> at the same time you can't change the libvirt.xml as it comes over >> >> with the migrating machine. The answer is to define a network and >> >> refer out to it from libvirt.xml. In Robert's current code he's >> >> using the group name of the PCI devices to create a network >> >> containing the list of equivalent devices (those in the group) that >>can be macvtapped. >> >> Thus when the host migrates it will find another, equivalent, >> >> interface. This falls over in the use case under consideration >> >> where a device can be mapped using more than one flavor, so we have >> >> to discard the use case or rethink the implementation. >> >> >> >> There's a more complex solution - I think - where we create a >> >> temporary network for each macvtap interface a machine's going to >> use, >> >> with a name based on the instance UUID and port number, and >> containing >> >> the device to map. Before starting the migration we would create a >> >> replacement network containing only the new device on the target >> host; >> >> migration would find the network from the name in the libvirt.xml, >> >> and the content of that network would behave identically. We'd be >> >> creating libvirt networks on the fly and a lot more of them, and >> >> we'd need decent cleanup code too ('when freeing a PCI device, >> >> delete any network it's a member of'), so it all becomes a lot more >>hairy. >> >> _______________________________________________ >> >> OpenStack-dev mailing list >> >> OpenStack-dev@lists.openstack.org >> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > >> >Ian/Robert, below is my understanding to the method Robet want to >> >use, am I right? >> > >> >a) Define a libvirt network as "Using a macvtap "direct" connection" >> >section at "http://libvirt.org/formatnetwork.html . For example, like >> >followed one: >> ><network> >> > <name> group_name1 </name> >> > <forward mode="bridge"> >> > <interface dev="eth20"/> >> > <interface dev="eth21"/> >> > <interface dev="eth22"/> >> > <interface dev="eth23"/> >> > <interface dev="eth24"/> >> > </forward> >> > </network> >> > >> > >> >b) When assign SRIOV NIC devices to an instance, as in "Assignment >> >from a pool of SRIOV VFs in a libvirt <network> definition" section >> >in >> >http://wiki.libvirt.org/page/Networking#PCI_Passthrough_of_host_netw >> ork_de >> >vices , use libvirt network definition group_name1. For example, like >> >followed one: >> > >> > <interface type='network'> >> > <source network='group_name1'> >> > </interface> >> > >> > >> >If my understanding is correct, then I have something unclear yet: >> >a) How will the libvirt create the libvirt network (i.e. libvirt >> >network group_name1)? Will it has be created when compute boot up, or >> >it will >> be >> >created before instance creation? I suppose per Robert's design, it's >> >created when compute node is up, am I right? >> > >> >b) If all the interface are used up by instance, what will happen. >> >Considering that 4 interface allocated to the group_name1 libvirt >> >network, and user try to migrate 6 instance with 'group_name1' >> >network, what will happen? >> > >> >And below is my comments: >> > >> >a) Yes, this is in fact different with the current nova PCI support >> >philosophy. Currently we assume Nova owns the devices, manage the >> device >> >assignment to each instance. While in such situation, libvirt network >> >is in fact another layer of PCI device management layer (although >> >very >> >thin) ! >> > >> >b) This also remind me that possibly other VMM like XenAPI has >> >special requirement and we need input/confirmation from them also. >> > >> > >> >As how to resolve the issue, I think there are several solution: >> > >> >a) Create one libvirt network for each SRIOV NIC assigned to each >> >instance dynamic, i.e. the libvirt network always has only one >> >interface included, it may be static created or dynamical created. >> >This solution in fact removes the allocation functionality of the >> >libvirt network and leaves only the configuration functionality. >> > >> >b) Change Nova PCI to support a special type of PCI device attribute >> >(like the PCI group). For these PCI attributes , the PCI device >> >scheduler will match a PCI devices only if the attributes is >> >specified clearly in the PCI flavor. >> > Below is an example: >> > considering two PCI SRIOV device: >> > Dev1: BDF=00:0.1, vendor_id=1, device_id=1, group=grp1 >> > Dev2: BDF=00:1.1, vendor_id=1, device_id=2 >> > i.e. Dev2 has no group attributes are specified. >> > >> > And we mark 'group' attribute as special attributes. >> > >> > Considering follow flavors: >> > Flavor1: name=flv1, vendor_id=1 >> > Flavor2: name=flv2, vendor_id=1, group=grp1 >> > Flavor3: name=flv3, group=grp1. >> > >> > The Dev1 will never be assigned to flv2. >> > This solution try to separate the devices managed by Nova >> >exclusively and devices managed by Nova/libvirt together. >> > >> >Any idea? >> > >> >Thanks >> >--jyh >> > >> > >> >_______________________________________________ >> >OpenStack-dev mailing list >> >OpenStack-dev@lists.openstack.org >> >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> >> _______________________________________________ >> OpenStack-dev mailing list >> OpenStack-dev@lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >_______________________________________________ >OpenStack-dev mailing list >OpenStack-dev@lists.openstack.org >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >_______________________________________________ >OpenStack-dev mailing list >OpenStack-dev@lists.openstack.org >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev