Fri, Mar 15, 2019 at 10:59:33PM CET, pa...@mellanox.com wrote: > > >> -----Original Message----- >> From: Jiri Pirko <j...@resnulli.us> >> Sent: Friday, March 15, 2019 3:08 PM >> To: Parav Pandit <pa...@mellanox.com> >> Cc: Samudrala, Sridhar <sridhar.samudr...@intel.com>; Jakub Kicinski >> <jakub.kicin...@netronome.com>; da...@davemloft.net; >> netdev@vger.kernel.org; oss-driv...@netronome.com >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI >> ports >> >> Fri, Mar 15, 2019 at 04:32:24PM CET, pa...@mellanox.com wrote: >> > >> > >> >> -----Original Message----- >> >> From: Samudrala, Sridhar <sridhar.samudr...@intel.com> >> >> Sent: Friday, March 15, 2019 12:58 AM >> >> To: Parav Pandit <pa...@mellanox.com>; Jakub Kicinski >> >> <jakub.kicin...@netronome.com> >> >> Cc: Jiri Pirko <j...@resnulli.us>; da...@davemloft.net; >> >> netdev@vger.kernel.org; oss-driv...@netronome.com >> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on >> >> devlink PCI ports >> >> >> >> >> >> On 3/14/2019 7:40 PM, Parav Pandit wrote: >> >> > >> >> > >> >> >> -----Original Message----- >> >> >> From: Samudrala, Sridhar <sridhar.samudr...@intel.com> >> >> >> Sent: Thursday, March 14, 2019 9:16 PM >> >> >> To: Parav Pandit <pa...@mellanox.com>; Jakub Kicinski >> >> >> <jakub.kicin...@netronome.com> >> >> >> Cc: Jiri Pirko <j...@resnulli.us>; da...@davemloft.net; >> >> >> netdev@vger.kernel.org; oss-driv...@netronome.com >> >> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on >> >> >> devlink PCI ports >> >> >> >> >> >> >> >> >> >> >> >> On 3/14/2019 6:28 PM, Parav Pandit wrote: >> >> >>> >> >> >>> >> >> >>>> -----Original Message----- >> >> >>>> From: Jakub Kicinski <jakub.kicin...@netronome.com> >> >> >>>> Sent: Thursday, March 14, 2019 6:39 PM >> >> >>>> To: Parav Pandit <pa...@mellanox.com> >> >> >>>> Cc: Jiri Pirko <j...@resnulli.us>; da...@davemloft.net; >> >> >>>> netdev@vger.kernel.org; oss-driv...@netronome.com >> >> >>>> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on >> >> >>>> devlink PCI ports >> >> >>>> >> >> >>>> On Thu, 14 Mar 2019 22:35:36 +0000, Parav Pandit wrote: >> >> >>>>>>> Then instances of flavour pci_vf are going to appear in the >> >> >>>>>>> same devlink instance. Those are the switch ports: >> >> >>>>>>> pci/0000:05:00.0/10002: type eth netdev enp5s0npf0pf0s0 >> >> >>>>>>> flavour pci_vf pf 0 vf 0 >> >> >>>>>>> switch_id 00154d130d2f peer >> >> >>>>>>> pci/0000:05:10.1/0 >> >> >>>>>>> pci/0000:05:00.0/10003: type eth netdev enp5s0npf0pf0s0 >> >> >>>>>>> flavour pci_vf pf 0 vf 0 subport 1 >> >> >>>>>>> switch_id 00154d130d2f peer >> >> >>>>>>> pci/0000:05:10.1/1 >> >> >>>>>>> >> >> >>>>>>> With that, peers are going to appear too, and those are the >> >> >>>>>>> actual VF/VF >> >> >>>>>>> subport: >> >> >>>>>>> pci/0000:05:10.1/0: type eth netdev ??? flavour pci_vf_host >> >> >>>>>>> peer pci/0000:05:00.0/10002 >> >> >>>>>>> pci/0000:05:10.1/1: type eth netdev ??? flavour pci_vf_host >> >> >>>>>>> peer pci/0000:05:00.0/10003 >> >> >>>>>>> >> >> >>>>>>> Later you can push this VF along with all subports to VM. So >> >> >>>>>>> in VM, you are going to see the VF like this: >> >> >>>>>>> $ devlink dev >> >> >>>>>>> pci/0000:00:08.0 >> >> >>>>>>> $ devlink port >> >> >>>>>>> pci/0000:00:08.0/0: type eth netdev ??? flavour pci_vf_host >> >> >>>>>>> pci/0000:00:08.0/1: type eth netdev ??? flavour pci_vf_host >> >> >>>>>>> >> >> >>>>>>> And back to your question of how are they connected in eswitch. >> >> >>>>>>> That is totally up to the original user John who did the creation. >> >> >>>>>>> He is in charge of the eswitch on baremetal, he would >> >> >>>>>>> configure the forwarding however he likes. >> >> >>>>>> >> >> >>>>>> Ack, so I think you're saying VM has to communicate to the >> >> >>>>>> cloud environment to have this provisioned using some service >> >> >>>>>> API, not a kernel API. That's what I wanted to confirm. >> >> >>>>>> >> >> >>>>>> I don't see any benefit to having the "host ports" under >> >> >>>>>> devlink, as such I think it's a matter of preference. >> >> >>>>> >> >> >>>>> We need 'host ports' to configure parameters of this host port >> >> >>>>> which is not exposed by the rep-netdev. >> >> >>>>> Such as mac address. >> >> >>>> >> >> >>>> Please look at the quote of what Jiri wrote above - the host >> >> >>>> port gets passed to the VM, you can't use it as a handle to set the >> MAC. >> >> >>>> >> >> >>>> The way to set the MAC remains: >> >> >>>> >> >> >>>> # devlink port set pci/0000:05:00.0/10002 peer mac_addr >> >> >>>> 00:11:22:33:44:55 >> >> >>>> >> >> >>> Even though it can be done, I think this is wrong model to >> >> >>> program >> >> >> hostport mac address using eswitch port. >> >> >>> All devlink objects are control objects, so what is passed to VM >> >> >>> is what is >> >> >> represented by devlink. >> >> >>> VF in the VM will anyway create its devlink object. >> >> >>> What is wrong in programming hostport? >> >> >>> It gives a very clear view to users of topology and objects. >> >> >> >> >> >> The VF or any subport MAC address should be configured by the >> >> >> orchestration layer that is running on the hypervisor and when a >> >> >> VF is assigned to a VF, the host port is not visible to the hypervisor. >> >> > What prevents creation of hostport due to which is not visible? >> >> > Hostport is control port to program host side of parameters. >> >> > It should be created when user wants to program the parameters. >> >> > >> >> > Model is really straight forward. >> >> > Program host port params using hostport object. >> >> > Program switchport params using rep-netdev. >> >> >> >> IIUC, Jiri/Jakub are proposing creation of 2 devlink objects for each >> >> port - host facing ports and switch facing ports. This is in addition >> >> to the netdevs that are created today. >> >> >> >I am not proposing any different. >> >I am proposing only two changes. >> >1. control hostport params via referring hostport (not via indirect >> >peer) >> >> Not really possible. If you passthrough VF into VM, the hostport goes along >> with it. >> >No. >I am sorry in showing the enumeration which is the source of confusion. > >Below is the right enumeration. > >When VF is enumerated initially in the host, where eswitch devlink instance is >located. >Below enumeration is seen. > >First two entries shows the link between hostport and switchport. >$ devlink port show >pci/0000:05:00.0/10002 eth netdev flavour switchport switch_id 00154d130d2f >peer pci/0000:05:00.0/1 > >pci/0000:05:00.0/1 eth netdev flavour hostport switch_id 00154d130d2f peer >pci/0000:05:00.0/10002
Hostport should not have switch_id. > >pci/0000:05:10.1/0 eth netdev flavour hostport >This entry won't be seen if VF auto probing is disabled. Because than VF is >not enumerated. > >As a user, I will be programming the mac address of hostport for a VF. >pci/0000:05:00.0/1 eth netdev flavour hostport switch_id 00154d130d2f peer >pci/0000:05:00.0/10002 Hmm, so you are going to have 2 hostports for VF: 1) pci/0000:05:10.1/0 real one, that is going to go to VM - with a separate pci address and devlink instance. 2) pci/0000:05:00.0/1 dummy one, which is not really a hostport, as there is no netdev created for it. It only models the other side of cable, which is away in VM. > > >> >> >2. flavour should not be vf/pf, flavour should be hostport, switchport. >> >Because switch is flat and agnostic of pf/vf/mdev. >> >> Not sure. It's good to have this kind of visibility. >> >port can have label/attribute indicating that this belong to VF-1 or mdev as >long as you are agreeing to have mdev attribute on host port. >(and not ask for abstracting it, because mdev is well defined kernel object). Why mdev cannot be another flavour? > >> >> > >> >> Are you suggesting that all the devlink objects should be visible >> >> only at the hypervisor layer? >> >> >> >Of course not. >> > >> >Ports and params controlled by hypervisor should be exposed at >> hypervisor/eswitch wherever its parent devlink instance exist. >> >Ports which should be visible inside a VM should be exposed inside a VM. >> >So for a given VF, >> > >> >If eswitch is at hypervisor level, >> >$ devlink port show >> >pci/0000:05:00.0/10002 eth netdev flavour switchport switch_id >> >00154d130d2f peer pci/0000:05:10.1/0 >> >pci/0000:05:10.1/0 eth netdev flavour hostport switch_id 00154d130d2f >> >peer pci/0000:05:00.0/10002 >> > >> >where VF is enumerated, >> >$ devlink port show >> >pci/0000:05:10.1/0 eth netdev flavour hostport >> >> So this is how it looks like in VM, right? >> >Yep. >Once VF is mapped to VM only two entries are seen and hostport can be still >controlled. > >$ devlink port show >pci/0000:05:00.0/10002 eth netdev flavour switchport switch_id 00154d130d2f >peer pci/0000:05:00.0/1 > >pci/0000:05:00.0/1 eth netdev flavour hostport switch_id 00154d130d2f peer >pci/0000:05:00.0/10002 > >This addresses the case for Infiniband where there is no eswitch, but >hostports exists and should be managed. >We shouldn't be inventing new devlink APIs or create a fake sw eswitch object >which doesn't exist in hw. > >> >> >This is because unprivileged VF doesn't have visibility to eswitch and its >> links. >> > >> >> I think the terminology need to be defined clearly so that we are all >> >> on the same page. >> >> >> >> > >> >> >> Currently we have ndo_set_vf_mac_addr api that works with PF >> >> >> netdev, but i think we are trying to move away from that API and >> >> >> do all the configuration via the port representor netdevs. >> >> > This is fine rep-netdev represents eswitch port. >> >> > You normally don't go to switch to program host port params. >> >> > >> >> >> As the mac address cannot be configured using this netdev, i think >> >> >> Jakub is suggesting creating a devlink opject for each port >> >> >> representor and use that interface to set peer mac address. >> >> > >> >> > I understand but is convoluted interface. >> >> > When you program host NIC mac address you talk to iLo or BIOS. >> >> > When you program switch side mac address, you go >> switch/router/modem. >> >> > >> >> > Also programming host params on host side, also doesn't make >> >> assumption that its connected to eswitch. >> >> > It also doesn't assume that same connectivity for its life. >> >> > >> >> > If you model around how physical devices are configured, it will >> >> > almost >> >> never go wrong and still provides same level of flexibility. >> >> > >> >> >> We should be able use this to configure port vlan too. >> >> >> >> >> >> Also, instead of subport, can we call vport and support different >> >> >> types of vports - sr-iov, siov, vmdq etc. >> >> >> >> >> > At switch level there are just ports. >> >> > sriov, siov, mdev, vmdq are their couter part (peer) where it is >> connected. >> >> > >> >> >>> >> >> >>> Also eswitch is flat. There is no need of pf/vf flavour for port. >> >> >>> It doesn't make sense to define 'mdev' flavour which we are >> >> >>> already >> >> >> working. >> >> >>> At eswitch level it is just a port, it happen to be connected to >> >> >>> vf or pf or >> >> >> other objects, it doesn't matter. >> >> >>> Port should be flavoured as 'hostport' or 'switchport'. >> >> >>> >> >> >>> >> >> >>>> (using the port ids from above)