Irena, have a word with Bob (rkukura on IRC, East coast), he was talking about what would be needed already and should be able to help you. Conveniently he's also core. ;) -- Ian.
On 12 January 2014 22:12, Irena Berezovsky <ire...@mellanox.com> wrote: > Hi John, > Thank you for taking an initiative and summing up the work that need to be > done to provide PCI pass-through network support. > The only item I think is missing is the neutron support for PCI > pass-through. Currently we have Mellanox Plugin that supports PCI > pass-through assuming Mellanox Adapter card embedded switch technology. But > in order to have fully integrated PCI pass-through networking support for > the use cases Robert listed on previous mail, the generic neutron PCI > pass-through support is required. This can be enhanced with vendor specific > task that may differ (Mellanox Embedded switch vs Cisco 802.1BR), but there > is still common part of being PCI aware mechanism driver. > I have already started with definition for this part: > > https://docs.google.com/document/d/1RfxfXBNB0mD_kH9SamwqPI8ZM-jg797ky_Fze7SakRc/edit# > I also plan to start coding soon. > > Depends on how it goes, I can take also nova parts that integrate with > neutron APIs from item 3. > > Regards, > Irena > > -----Original Message----- > From: John Garbutt [mailto:j...@johngarbutt.com] > Sent: Friday, January 10, 2014 4:34 PM > To: OpenStack Development Mailing List (not for usage questions) > Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network > support > > Apologies for this top post, I just want to move this discussion towards > action. > > I am traveling next week so it is unlikely that I can make the meetings. > Sorry. > > Can we please agree on some concrete actions, and who will do the coding? > This also means raising new blueprints for each item of work. > I am happy to review and eventually approve those blueprints, if you email > me directly. > > Ideas are taken from what we started to agree on, mostly written up here: > https://wiki.openstack.org/wiki/Meetings/Passthrough#Definitions > > > What doesn't need doing... > ==================== > > We have PCI whitelist and PCI alias at the moment, let keep those names > the same for now. > I personally prefer PCI-flavor, rather than PCI-alias, but lets discuss > any rename separately. > > We seemed happy with the current system (roughly) around GPU passthrough: > nova flavor-key <three_GPU_attached_30GB> set "pci_passthrough:alias"=" > large_GPU:1,small_GPU:2" > nova boot --image some_image --flavor <three_GPU_attached_30GB> <some_name> > > Again, we seemed happy with the current PCI whitelist. > > Sure, we could optimise the scheduling, but again, please keep that a > separate discussion. > Something in the scheduler needs to know how many of each PCI alias are > available on each host. > How that information gets there can be change at a later date. > > PCI alias is in config, but its probably better defined using host > aggregates, or some custom API. > But lets leave that for now, and discuss it separately. > If the need arrises, we can migrate away from the config. > > > What does need doing... > ================== > > 1) API & CLI changes for "nic-type", and associated tempest tests > > * Add a user visible "nic-type" so users can express on of several network > types. > * We need a default nic-type, for when the user doesn't specify one (might > default to SRIOV in some cases) > * We can easily test the case where the default is virtual and the user > expresses a preference for virtual > * Above is much better than not testing it at all. > > nova boot --flavor m1.large --image <image_id> > --nic net-id=<net-id-1> > --nic net-id=<net-id-2>,nic-type=fast > --nic net-id=<net-id-3>,nic-type=fast <vm-name> > > or > > neutron port-create > --fixed-ip subnet_id=<subnet-id>,ip_address=192.168.57.101 > --nic-type=<slow | fast | foobar> > <net-id> > nova boot --flavor m1.large --image <image_id> --nic port-id=<port-id> > > Where nic-type is just an extra bit metadata string that is passed to nova > and the VIF driver. > > > 2) Expand PCI alias information > > We need extensions to PCI alias so we can group SRIOV devices better. > > I still think we are yet to agree on a format, but I would suggest this as > a starting point: > > { > "name":"GPU_fast", > devices:[ > {"vendor_id":"1137","product_id":"0071", address:"*", > "attach-type":"direct"}, > {"vendor_id":"1137","product_id":"0072", address:"*", > "attach-type":"direct"} ], > sriov_info: {} > } > > { > "name":"NIC_fast", > devices:[ > {"vendor_id":"1137","product_id":"0071", address:"0:[1-50]:2:*", > "attach-type":"macvtap"} > {"vendor_id":"1234","product_id":"0081", address:"*", > "attach-type":"direct"} ], > sriov_info: { > "nic_type":"fast", > "network_ids": ["net-id-1", "net-id-2"] } } > > { > "name":"NIC_slower", > devices:[ > {"vendor_id":"1137","product_id":"0071", address:"*", > "attach-type":"direct"} > {"vendor_id":"1234","product_id":"0081", address:"*", > "attach-type":"direct"} ], > sriov_info: { > "nic_type":"fast", > "network_ids": ["*"] # this means could attach to any network } } > > The idea being the VIF driver gets passed this info, when network_info > includes a nic that matches. > Any other details, like VLAN id, would come from neutron, and passed to > the VIF driver as normal. > > > 3) Reading "nic_type" and doing the PCI passthrough of NIC user requests > > Not sure we are agreed on this, but basically: > * network_info contains "nic-type" from neutron > * need to select the correct VIF driver > * need to pass matching PCI alias information to VIF driver > * neutron passes details other details (like VLAN id) as before > * nova gives VIF driver an API that allows it to attach PCI devices that > are in the whitelist to the VM being configured > * with all this, the VIF driver can do what it needs to do > * lets keep it simple, and expand it as the need arrises > > 4) Make changes to VIF drivers, so the above is implemented > > Depends on (3) > > > > These seems like some good steps to get the basics in place for PCI > passthrough networking. > Once its working, we can review it and see if there are things that need > to evolve further. > > Does that seem like a workable approach? > Who is willing to implement any of (1), (2) and (3)? > > > Cheers, > John > > > On 9 January 2014 17:47, Ian Wells <ijw.ubu...@cack.org.uk> wrote: > > I think I'm in agreement with all of this. Nice summary, Robert. > > > > It may not be where the work ends, but if we could get this done the > > rest is just refinement. > > > > > > On 9 January 2014 17:49, Robert Li (baoli) <ba...@cisco.com> wrote: > >> > >> Hi Folks, > >> > >> > >> With John joining the IRC, so far, we had a couple of productive > >> meetings in an effort to come to consensus and move forward. Thanks > >> John for doing that, and I appreciate everyone's effort to make it to > the daily meeting. > >> Let's reconvene on Monday. > >> > >> But before that, and based on our today's conversation on IRC, I'd > >> like to say a few things. I think that first of all, we need to get > >> agreement on the terminologies that we are using so far. With the > >> current nova PCI passthrough > >> > >> PCI whitelist: defines all the available PCI passthrough > >> devices on a compute node. pci_passthrough_whitelist=[{ > >> "vendor_id":"xxxx","product_id":"xxxx"}] > >> PCI Alias: criteria defined on the controller node with which > >> requested PCI passthrough devices can be selected from all the PCI > >> passthrough devices available in a cloud. > >> Currently it has the following format: > >> pci_alias={"vendor_id":"xxxx", "product_id":"xxxx", "name":"str"} > >> > >> nova flavor extra_specs: request for PCI passthrough devices > >> can be specified with extra_specs in the format for > >> example:"pci_passthrough:alias"="name:count" > >> > >> As you can see, currently a PCI alias has a name and is defined on > >> the controller. The implications for it is that when matching it > >> against the PCI devices, it has to match the vendor_id and product_id > >> against all the available PCI devices until one is found. The name is > >> only used for reference in the extra_specs. On the other hand, the > >> whitelist is basically the same as the alias without a name. > >> > >> What we have discussed so far is based on something called PCI groups > >> (or PCI flavors as Yongli puts it). Without introducing other > >> complexities, and with a little change of the above representation, > >> we will have something > >> like: > >> > >> pci_passthrough_whitelist=[{ "vendor_id":"xxxx","product_id":"xxxx", > >> "name":"str"}] > >> > >> By doing so, we eliminated the PCI alias. And we call the "name" in > >> above as a PCI group name. You can think of it as combining the > >> definitions of the existing whitelist and PCI alias. And believe it > >> or not, a PCI group is actually a PCI alias. However, with that > >> change of thinking, a lot of benefits can be harvested: > >> > >> * the implementation is significantly simplified > >> * provisioning is simplified by eliminating the PCI alias > >> * a compute node only needs to report stats with something > like: > >> PCI group name:count. A compute node processes all the PCI > >> passthrough devices against the whitelist, and assign a PCI group > >> based on the whitelist definition. > >> * on the controller, we may only need to define the PCI > >> group names. if we use a nova api to define PCI groups (could be > >> private or public, for example), one potential benefit, among other > >> things (validation, etc), they can be owned by the tenant that > >> creates them. And thus a wholesale of PCI passthrough devices is also > possible. > >> * scheduler only works with PCI group names. > >> * request for PCI passthrough device is based on PCI-group > >> * deployers can provision the cloud based on the PCI groups > >> * Particularly for SRIOV, deployers can design SRIOV PCI > >> groups based on network connectivities. > >> > >> Further, to support SRIOV, we are saying that PCI group names not > >> only can be used in the extra specs, it can also be used in the -nic > >> option and the neutron commands. This allows the most flexibilities > >> and functionalities afforded by SRIOV. > >> > >> Further, we are saying that we can define default PCI groups based on > >> the PCI device's class. > >> > >> For vnic-type (or nic-type), we are saying that it defines the link > >> characteristics of the nic that is attached to a VM: a nic that's > >> connected to a virtual switch, a nic that is connected to a physical > >> switch, or a nic that is connected to a physical switch, but has a > >> host macvtap device in between. The actual names of the choices are > >> not important here, and can be debated. > >> > >> I'm hoping that we can go over the above on Monday. But any comments > >> are welcome by email. > >> > >> Thanks, > >> Robert > >> > >> > >> _______________________________________________ > >> OpenStack-dev mailing list > >> OpenStack-dev@lists.openstack.org > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >> > > > > > > _______________________________________________ > > OpenStack-dev mailing list > > OpenStack-dev@lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
_______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev