On 6/2/2021 1:46 PM, Ilya Maximets wrote: > On 6/1/21 4:28 PM, Ivan Malov wrote: >> Hi Ilya, >> >> Thank you for reviewing the proposal at such short notice. I'm afraid that >> prior discussions overlook the simple fact that the whole problem is not >> limited to just VF representors. Action PORT_ID is also used with respect to >> the admin PF's ethdev, which "represents itself" (and by no means it >> represents the underlying physical/network port). In this case, one cannot >> state that the application treats it as a physical port, just like one >> states that the application perceives representors as VFs themselves. > > > I don't think that it was overlooked. If device is in a switchdev mode than > there is a PF representor and VF representors. Application typically works > only with representors in this case is it doesn't make much sense to have > representor and the upstream port attached to the same application at the > same time. Configuration that is applied by application to the representor > (PF or VF, it doesn't matter) applies to the corresponding upstream port > (actual PF or VF) by default. > > Exactly same thing here with PORT_ID action. You have a packet and action > to send it to the port, but it's not specified if HW needs to send it to > the representor or the upstream port (again, VF or PF, it doesn't matter). > Since there is no extra information, HW should send it to the upstream > port by default. The same as configuration applies by default to the > upstream port. > > Let's look at some workflow examples: > > DPDK Application > | | > | | > +--PF-rep------VF-rep---+ > | | > | NIC (switchdev) | > | | > +---PF---------VF-------+ > | | > | | > External VM or whatever > Network > > a. Workflow for "DPDK Application" to set MAC to VF: > > 1. "DPDK Application" calls rte_set_etheraddr("VF-rep", new_mac); > 2. DPDK sets MAC for "VF". > > b. Workflow for "DPDK Application" to set MAC to PF: > > 1. "DPDK Application" calls rte_set_etheraddr("PF-rep", new_mac); > 2. DPDK sets MAC for "PF". > > c. Workflow for "DPDK Application" to send packet to the external network: > > 1. "DPDK Application" calls rte_eth_tx_burst("PF-rep", packet); > 2. NIC receives the packet from "PF-rep" and sends it to "PF". > 3. packet egresses to the external network from "PF". > > d. Workflow for "DPDK Application" to send packet to the "VM or whatever": > > 1. "DPDK Application" calls rte_eth_tx_burst("VF-rep", packet); > 2. NIC receives the packet from "VF-rep" and sends it to "VF". > 3. "VM or whatever" receives the packet from "VF". > > In two workflows above there is no rte_flow processing on step 2, i.e., > NIC does not perform any lookups/matches/actions, because it's not possible > to configure actions for packets received from "PF-rep" or > "VF-rep" as these ports doesn't own a port id and all the configuration > and rte_flow actions translated and applied for the devices that these > ports represents ("PF" and "VF") and not representors themselves ("PF-rep" > or "VF-rep"). > > e. Workflow for the packet received on PF and PORT_ID action: > > 1. "DPDK Application" configures rte_flow for all packets from "PF-rep" > to execute PORT_ID "VF-rep". > 2. NIC receives packet on "PF". > 3. NIC executes 'PORT_ID "VF-rep"' action by sending packet to "VF". > 4. "VM or whatever" receives the packet from "VF". > > f. Workflow for the packet received on VF and PORT_ID action: > > 1. "DPDK Application" configures rte_flow for all packets from "VF-rep" > to execute 'PORT_ID "PF-rep"'. > 2. NIC receives packet on "VF". > 3. NIC executes 'PORT_ID "PF-rep"' action by sending packet to "PF". > 4. Packet egresses from the "PF" to the external network. > > Above is what, IMHO, the logic should look like and this matches with > the overall switchdev design in kernel. >
Hi Ilya, Thanks for clearly explaining the usecase, this was useful (at least for me). But I am still not clear what is the other usecase, when port_id action is for 'VF-rep' packets sent to 'VF-ref' (instead of VF). I remember Ilya mentioned both 'VF-rep' & 'VF' can be attached to an application for debug purposes, but not any real life usage mentioned, unless I missed. And if represontor datapath works independently, instead of being a pipe/wire to represented port, won't it be a virtual partition of the port, instead of representor of the port? > I understand that this logic could seem flipped-over from the HW point > of view, but it's perfectly logical from the user's perspective, because > user should not care if the application works with representors or > some real devices. If application configures that all packets from port > A should be sent to port B, user will expect that these packets will > egress from port B once received from port A. That will be highly > inconvenient if the packet will ingress from port B back to the > application instead. > > DPDK Application > | | > | | > port A port B > | | > *****MAGIC***** > | | > External Another Network > Network or VM or whatever > > It should not matter if there is an extra layer between ports A and B > and the external network and VM. Everything should work in exactly the > same way, transparently for the application. > > The point of hardware offloading, and therefore rte_flow API, is to take > what user does in software and make this "magically" work in hardware in > the exactly same way. And this will be broken if user will have to > use different logic based on the mode the hardware works in, i.e. based on > the fact if the application works with ports or their representors. > > If some specific use case requires application to know if it's an > upstream port or the representor and demystify the internals of the switchdev > NIC, there should be a different port id for the representor itself that > could be used in all DPDK APIs including rte_flow API or a special bit for > that matter. IIRC, there was an idea to add a bit directly to the port_id > for that purpose that will flip over behavior in all the workflow scenarios > that I described above. > >> >> Given these facts, it would not be quite right to just align the >> documentation with the de-facto action meaning assumed by OvS. > > It's not a "meaning assumed by OvS", it's the original design and the > main idea of a switchdev based on a common sense. > >> >> On 01/06/2021 15:10, Ilya Maximets wrote: >>> On 6/1/21 1:14 PM, Ivan Malov wrote: >>>> By its very name, action PORT_ID means that packets hit an ethdev with the >>>> given DPDK port ID. At least the current comments don't state the opposite. >>>> That said, since port representors had been adopted, applications like OvS >>>> have been misusing the action. They misread its purpose as sending packets >>>> to the opposite end of the "wire" plugged to the given ethdev, for example, >>>> redirecting packets to the VF itself rather than to its representor ethdev. >>>> Another example: OvS relies on this action with the admin PF's ethdev port >>>> ID specified in it in order to send offloaded packets to the physical port. >>>> >>>> Since there might be applications which use this action in its valid sense, >>>> one can't just change the documentation to greenlight the opposite meaning. >>>> This patch adds an explicit bit to the action configuration which will let >>>> applications, depending on their needs, leverage the two meanings properly. >>>> Applications like OvS, as well as PMDs, will have to be corrected when the >>>> patch has been applied. But the improved clarity of the action is worth it. >>>> >>>> The proposed change is not the only option. One could avoid changes in OvS >>>> and PMDs if the new configuration field had the opposite meaning, with the >>>> action itself meaning delivery to the represented port and not to DPDK one. >>>> Alternatively, one could define a brand new action with the said behaviour. >>> >>> We had already very similar discussions regarding the understanding of what >>> the representor really is from the DPDK API's point of view, and the last >>> time, IIUC, it was concluded by a tech. board that representor should be >>> a "ghost of a VF", i.e. DPDK APIs should apply configuration by default to >>> VF and not to the representor device: >>> >>> https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-tho...@monjalon.net/#104376 >>> This wasn't enforced though, IIUC, for existing code and semantics is still >>> mixed. >>> >>> I still think that configuration should be applied to VF, and the same >>> applies >>> to rte_flow API. IMHO, average application should not care if device is >>> a VF itself or its representor. Everything should work exactly the same. >>> I think this matches with the original idea/design of the switchdev >>> functionality >>> in the linux kernel and also matches with how the average user thinks about >>> representor devices. >>> >>> If some specific use-case requires to distinguish VF from the representor, >>> there should probably be a separate special API/flag for that. >>> >>> Best regards, Ilya Maximets. >>> >> >