On 6/2/2021 1:46 PM, Ilya Maximets wrote:
> On 6/1/21 4:28 PM, Ivan Malov wrote:
>> Hi Ilya,
>>
>> Thank you for reviewing the proposal at such short notice. I'm afraid that 
>> prior discussions overlook the simple fact that the whole problem is not 
>> limited to just VF representors. Action PORT_ID is also used with respect to 
>> the admin PF's ethdev, which "represents itself" (and by no means it 
>> represents the underlying physical/network port). In this case, one cannot 
>> state that the application treats it as a physical port, just like one 
>> states that the application perceives representors as VFs themselves.
> 
> 
> I don't think that it was overlooked.  If device is in a switchdev mode than
> there is a PF representor and VF representors.  Application typically works
> only with representors in this case is it doesn't make much sense to have
> representor and the upstream port attached to the same application at the
> same time.  Configuration that is applied by application to the representor
> (PF or VF, it doesn't matter) applies to the corresponding upstream port
> (actual PF or VF) by default.
> 
> Exactly same thing here with PORT_ID action.  You have a packet and action
> to send it to the port, but it's not specified if HW needs to send it to
> the representor or the upstream port (again, VF or PF, it doesn't matter).
> Since there is no extra information, HW should send it to the upstream
> port by default.  The same as configuration applies by default to the
> upstream port.
> 
> Let's look at some workflow examples:
> 
>       DPDK Application
>         |         |
>         |         |
>    +--PF-rep------VF-rep---+
>    |                       |
>    |    NIC (switchdev)    |
>    |                       |
>    +---PF---------VF-------+
>        |          |
>        |          |
>    External       VM or whatever
>    Network
> 
> a. Workflow for "DPDK Application" to set MAC to VF:
> 
> 1. "DPDK Application" calls rte_set_etheraddr("VF-rep", new_mac);
> 2.  DPDK sets MAC for "VF".
> 
> b. Workflow for "DPDK Application" to set MAC to PF:
> 
> 1. "DPDK Application" calls rte_set_etheraddr("PF-rep", new_mac);
> 2.  DPDK sets MAC for "PF".
> 
> c. Workflow for "DPDK Application" to send packet to the external network:
> 
> 1. "DPDK Application" calls rte_eth_tx_burst("PF-rep", packet);
> 2. NIC receives the packet from "PF-rep" and sends it to "PF".
> 3. packet egresses to the external network from "PF".
> 
> d. Workflow for "DPDK Application" to send packet to the "VM or whatever":
> 
> 1. "DPDK Application" calls rte_eth_tx_burst("VF-rep", packet);
> 2. NIC receives the packet from "VF-rep" and sends it to "VF".
> 3. "VM or whatever" receives the packet from "VF".
> 
> In two workflows above there is no rte_flow processing on step 2, i.e.,
> NIC does not perform any lookups/matches/actions, because it's not possible
> to configure actions for packets received from "PF-rep" or
> "VF-rep" as these ports doesn't own a port id and all the configuration
> and rte_flow actions translated and applied for the devices that these
> ports represents ("PF" and "VF") and not representors themselves ("PF-rep"
> or "VF-rep").
> 
> e. Workflow for the packet received on PF and PORT_ID action:
> 
> 1. "DPDK Application" configures rte_flow for all packets from "PF-rep"
>    to execute PORT_ID "VF-rep".
> 2. NIC receives packet on "PF".
> 3. NIC executes 'PORT_ID "VF-rep"' action by sending packet to "VF".
> 4. "VM or whatever" receives the packet from "VF".
> 
> f. Workflow for the packet received on VF and PORT_ID action:
> 
> 1. "DPDK Application" configures rte_flow for all packets from "VF-rep"
>    to execute 'PORT_ID "PF-rep"'.
> 2. NIC receives packet on "VF".
> 3. NIC executes 'PORT_ID "PF-rep"' action by sending packet to "PF".
> 4. Packet egresses from the "PF" to the external network.
> 
> Above is what, IMHO, the logic should look like and this matches with
> the overall switchdev design in kernel.
> 

Hi Ilya,

Thanks for clearly explaining the usecase, this was useful (at least for me).

But I am still not clear what is the other usecase, when port_id action is for
'VF-rep' packets sent to 'VF-ref' (instead of VF).
I remember Ilya mentioned both 'VF-rep' & 'VF' can be attached to an application
for debug purposes, but not any real life usage mentioned, unless I missed.

And if represontor datapath works independently, instead of being a pipe/wire to
represented port, won't it be a virtual partition of the port, instead of
representor of the port?


> I understand that this logic could seem flipped-over from the HW point
> of view, but it's perfectly logical from the user's perspective, because
> user should not care if the application works with representors or
> some real devices.  If application configures that all packets from port
> A should be sent to port B, user will expect that these packets will
> egress from port B once received from port A.  That will be highly
> inconvenient if the packet will ingress from port B back to the
> application instead.
> 
>       DPDK Application
>         |          |
>         |          |
>      port A     port B
>         |          |
>       *****MAGIC*****
>         |          |
>    External      Another Network
>    Network       or VM or whatever
> 
> It should not matter if there is an extra layer between ports A and B
> and the external network and VM.  Everything should work in exactly the
> same way, transparently for the application.
> 
> The point of hardware offloading, and therefore rte_flow API, is to take
> what user does in software and make this "magically" work in hardware in
> the exactly same way.  And this will be broken if user will have to
> use different logic based on the mode the hardware works in, i.e. based on
> the fact if the application works with ports or their representors.
> 
> If some specific use case requires application to know if it's an
> upstream port or the representor and demystify the internals of the switchdev
> NIC, there should be a different port id for the representor itself that
> could be used in all DPDK APIs including rte_flow API or a special bit for
> that matter.  IIRC, there was an idea to add a bit directly to the port_id
> for that purpose that will flip over behavior in all the workflow scenarios
> that I described above.
> 
>>
>> Given these facts, it would not be quite right to just align the 
>> documentation with the de-facto action meaning assumed by OvS.
> 
> It's not a "meaning assumed by OvS", it's the original design and the
> main idea of a switchdev based on a common sense.
> 
>>
>> On 01/06/2021 15:10, Ilya Maximets wrote:
>>> On 6/1/21 1:14 PM, Ivan Malov wrote:
>>>> By its very name, action PORT_ID means that packets hit an ethdev with the
>>>> given DPDK port ID. At least the current comments don't state the opposite.
>>>> That said, since port representors had been adopted, applications like OvS
>>>> have been misusing the action. They misread its purpose as sending packets
>>>> to the opposite end of the "wire" plugged to the given ethdev, for example,
>>>> redirecting packets to the VF itself rather than to its representor ethdev.
>>>> Another example: OvS relies on this action with the admin PF's ethdev port
>>>> ID specified in it in order to send offloaded packets to the physical port.
>>>>
>>>> Since there might be applications which use this action in its valid sense,
>>>> one can't just change the documentation to greenlight the opposite meaning.
>>>> This patch adds an explicit bit to the action configuration which will let
>>>> applications, depending on their needs, leverage the two meanings properly.
>>>> Applications like OvS, as well as PMDs, will have to be corrected when the
>>>> patch has been applied. But the improved clarity of the action is worth it.
>>>>
>>>> The proposed change is not the only option. One could avoid changes in OvS
>>>> and PMDs if the new configuration field had the opposite meaning, with the
>>>> action itself meaning delivery to the represented port and not to DPDK one.
>>>> Alternatively, one could define a brand new action with the said behaviour.
>>>
>>> We had already very similar discussions regarding the understanding of what
>>> the representor really is from the DPDK API's point of view, and the last
>>> time, IIUC, it was concluded by a tech. board that representor should be
>>> a "ghost of a VF", i.e. DPDK APIs should apply configuration by default to
>>> VF and not to the representor device:
>>>    
>>> https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-tho...@monjalon.net/#104376
>>> This wasn't enforced though, IIUC, for existing code and semantics is still 
>>> mixed.
>>>
>>> I still think that configuration should be applied to VF, and the same 
>>> applies
>>> to rte_flow API.  IMHO, average application should not care if device is
>>> a VF itself or its representor.  Everything should work exactly the same.
>>> I think this matches with the original idea/design of the switchdev 
>>> functionality
>>> in the linux kernel and also matches with how the average user thinks about
>>> representor devices.
>>>
>>> If some specific use-case requires to distinguish VF from the representor,
>>> there should probably be a separate special API/flag for that.
>>>
>>> Best regards, Ilya Maximets.
>>>
>>
> 

Reply via email to