On 02/06/2021 14:21, Eli Britstein wrote:
On 6/2/2021 1:50 PM, Andrew Rybchenko wrote:
External email: Use caution opening links or attachments
On 6/2/21 12:57 PM, Eli Britstein wrote:
On 6/1/2021 5:53 PM, Andrew Rybchenko wrote:
External email: Use caution opening links or attachments
On 6/1/21 5:44 PM, Eli Britstein wrote:
On 6/1/2021 5:35 PM, Andrew Rybchenko wrote:
External email: Use caution opening links or attachments
On 6/1/21 4:24 PM, Eli Britstein wrote:
On 6/1/2021 3:10 PM, Ilya Maximets wrote:
External email: Use caution opening links or attachments
On 6/1/21 1:14 PM, Ivan Malov wrote:
By its very name, action PORT_ID means that packets hit an ethdev
with the
given DPDK port ID. At least the current comments don't state the
opposite.
That said, since port representors had been adopted, applications
like OvS
have been misusing the action. They misread its purpose as sending
packets
to the opposite end of the "wire" plugged to the given ethdev, for
example,
redirecting packets to the VF itself rather than to its
representor
ethdev.
Another example: OvS relies on this action with the admin PF's
ethdev
port
ID specified in it in order to send offloaded packets to the
physical
port.
Since there might be applications which use this action in its
valid
sense,
one can't just change the documentation to greenlight the opposite
meaning.
This patch adds an explicit bit to the action configuration which
will let
applications, depending on their needs, leverage the two meanings
properly.
Applications like OvS, as well as PMDs, will have to be corrected
when the
patch has been applied. But the improved clarity of the action is
worth it.
The proposed change is not the only option. One could avoid
changes
in OvS
and PMDs if the new configuration field had the opposite meaning,
with the
action itself meaning delivery to the represented port and not to
DPDK one.
Alternatively, one could define a brand new action with the said
behaviour.
It doesn't make any sense to attach the VF itself to OVS, but
only its
representor.
OvS is not the only DPDK application.
True. It is just the focus of this commit message is OVS.
For the PF, when in switchdev mode, it is the "uplink
representor", so
it is also a representor.
Strictly speaking it is not a representor from DPDK point of
view. E.g. representors have corresponding flag set which is
definitely clear in the case of PF.
This is the per-PMD responsibility. The API should not care.
That said, OVS does not care of the type of the port. It doesn't
matter
if it's an "upstream" or not, or if it's a representor or not.
Yes, it is clear, but let's put OvS aside. Let's consider a
DPDK application which has a number of ethdev port. Some may
belong to single switch domain, some may be from different
switch domains (i.e. different NICs). Can I use PORT_ID action
to redirect ingress traffic to a specified ethdev port using
PORT_ID action? It looks like no, but IMHO it is the definition
of the PORT_ID action.
Let's separate API from implementation. By API point of view, yes, the
user may request it. Nothing wrong with it.
From implementation point of view - yes, it might fail, but not for
sure, even if on different NICs. Maybe the HW of a certain vendor has
the capability to do it?
We can't know, so I think the API should allow it.
Hold on. What should it allow? It is two opposite meanings:
1. Direct traffic to DPDK ethdev port specified using ID to be
received and processed by the DPDK application.
2. Direct traffic to an upstream port represented by the
DPDK port.
The patch tries to address the ambiguity, misuse it in OvS
(from my point of view in accordance with the action
documentation), mis-implementation in a number of PMDs
(to work in OvS) and tries to sort it out with an explanation
why proposed direction is chosen. I realize that it could be
painful, but IMHO it is the best option here. Yes, it is a
point to discuss.
To start with we should agree that that problem exists.
Second, we should agree on direction how to solve it.
I agree. Suppose port 0 is the PF, and port 1 is a VF representor.
IIUC, there are two options:
1. flow create 1 ingress transfer pattern eth / end action port_id id 0
upstream 1 / end
2. flow create 1 ingress transfer pattern eth / end action port_id id 0
upstream 0 / end
[1] is the same behavior as today.
[2] is a new behavior, the packet received by port 0 as if it arrived
from the wire.
Then, let's have more:
3. flow create 0 ingress transfer pattern eth / end action port_id id 1
upstream 1 / end
4. flow create 0 ingress transfer pattern eth / end action port_id id 1
upstream 0 / end
if we have [2] and [4], the packet going from the VF will hit [2], then
hit [4] and then [2] again in an endless loop?
As I understand PORT_ID is a fate action. So, no more lookups
are done. If the packet is loop back from applications, loop is
possible.
I referred a HW loop, not SW. For example with JUMP action (also fate):
flow create 0 group 0 ingress transfer pattern eth / end action jump
group 1 / end
flow create 0 group 1 ingress transfer pattern eth / end action jump
group 0 / end
In fact, it is a good question if "flow creare 0 ingress
transfer" or "flow create 1 ingress transfer" assume any
implicit filtering. I always thought that no.
i.e. if we have two network ports rule like
flow create 0 ingress transfer pattern eth / end \
action port_id id 1 upstream 1 / end
will match packets incoming from any port into the switch
(network port 0, network port 1, VF or PF itself (???)).
The topic also requires explicit clarification.
rte_flow is port based. It implicitly filters only packets for the
provided port (0).
Maybe need to clarify documentation and have a "no filtering" API if
needed.
We've come across the following bits in the current documentation with
respect to attribute "transfer", quote:
"Instead of simply matching the properties of traffic as it would appear
on a given DPDK port ID, enabling this attribute transfers a flow rule
to the lowest possible level of any device endpoints found in the pattern.
When supported, this effectively enables an application to reroute
traffic not necessarily intended for it (e.g. coming from or addressed
to different physical ports, VFs or applications) at the device level".
(https://doc.dpdk.org/guides/prog_guide/rte_flow.html#attributes)
Since action PORT_ID hardly makes sense without attribute "transfer"
(unless it doesn't point to the same ethdev as the one used to submit
the flow), this paragraph effectively states that in this particular
case API "flow_create" is not (necessarily) port-based.
PF itself is really a hard question because of "ingress"
since traffic from PF is a traffic from DPDK application and
it is egress, not ingress.
Ingress means the direction. Hit on packets otherwise provided to the SW
by rte_eth_rx_burst().
Same goes for the PF. Packets by rte_eth_rx_burst are the ones arriving
from the wire, so ingress is that direction and egress is from the app.
I think that port ID used to created flow rule should not
apply any filtering in the case of transfer since we have
corresponding items to do it explicitly. If we do it implicitly
as well, we need some priorities and a way to avoid implicit
rules which makes things much harder to understand and
implement.
If "upstream 0" means what I thought it means (comments?) maybe a better
way to do it is expose another port for that, so there will be 2 "PF"
ports - one as the wire representor and the other one as the "PF" (or
clearer naming...).
This would be a vendor decision, and there would be no need to change
PORT_ID API.
If this is your meaning, maybe what you are looking for is an action to
change the in_port and continue processing?
Please comment on the examples I gave or clarify the use case you are
trying to do.
Thanks,
Eli
We had already very similar discussions regarding the
understanding of
what
the representor really is from the DPDK API's point of view, and
the
last
time, IIUC, it was concluded by a tech. board that representor
should be
a "ghost of a VF", i.e. DPDK APIs should apply configuration by
default to
VF and not to the representor device:
https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-tho...@monjalon.net/#104376
This wasn't enforced though, IIUC, for existing code and
semantics is
still mixed.
I am not sure how this is related.
I still think that configuration should be applied to VF, and the
same
applies
to rte_flow API. IMHO, average application should not care if
device is
a VF itself or its representor. Everything should work exactly the
same.
I think this matches with the original idea/design of the switchdev
functionality
in the linux kernel and also matches with how the average user
thinks
about
representor devices.
Right. This is the way representors work. It is fully aligned with
configuration of OVS-kernel.
If some specific use-case requires to distinguish VF from the
representor,
there should probably be a separate special API/flag for that.
Best regards, Ilya Maximets.
--
Ivan M