RE: [PATCH v1] ethdev: add direction info when creating the transfer table

Ivan Malov Thu, 15 Sep 2022 00:47:17 -0700

Hi Rongwei,

On Thu, 15 Sep 2022, Rongwei Liu wrote:

HI Ivan:

BR
Rongwei

-----Original Message-----
From: Ivan Malov <ivan.ma...@oktetlabs.ru>
Sent: Wednesday, September 14, 2022 23:18
To: Rongwei Liu <rongw...@nvidia.com>
Cc: Matan Azrad <ma...@nvidia.com>; Slava Ovsiienko
<viachesl...@nvidia.com>; Ori Kam <or...@nvidia.com>; NBU-Contact-
Thomas Monjalon (EXTERNAL) <tho...@monjalon.net>; Aman Singh
<aman.deep.si...@intel.com>; Yuying Zhang <yuying.zh...@intel.com>;
Andrew Rybchenko <andrew.rybche...@oktetlabs.ru>; dev@dpdk.org; Raslan
Darawsheh <rasl...@nvidia.com>
Subject: RE: [PATCH v1] ethdev: add direction info when creating the transfer
table

External email: Use caution opening links or attachments


Hi Rongwei,

On Wed, 14 Sep 2022, Rongwei Liu wrote:

HI

BR
Rongwei

-----Original Message-----
From: Ivan Malov <ivan.ma...@oktetlabs.ru>
Sent: Wednesday, September 14, 2022 15:32
To: Rongwei Liu <rongw...@nvidia.com>
Cc: Matan Azrad <ma...@nvidia.com>; Slava Ovsiienko
<viachesl...@nvidia.com>; Ori Kam <or...@nvidia.com>; NBU-Contact-
Thomas Monjalon (EXTERNAL) <tho...@monjalon.net>; Aman Singh
<aman.deep.si...@intel.com>; Yuying Zhang <yuying.zh...@intel.com>;
Andrew Rybchenko <andrew.rybche...@oktetlabs.ru>; dev@dpdk.org;
Raslan Darawsheh <rasl...@nvidia.com>
Subject: RE: [PATCH v1] ethdev: add direction info when creating the
transfer table

External email: Use caution opening links or attachments


Hi,

On Wed, 14 Sep 2022, Rongwei Liu wrote:

HI

BR
Rongwei

-----Original Message-----
From: Ivan Malov <ivan.ma...@oktetlabs.ru>
Sent: Tuesday, September 13, 2022 22:33
To: Rongwei Liu <rongw...@nvidia.com>
Cc: Matan Azrad <ma...@nvidia.com>; Slava Ovsiienko
<viachesl...@nvidia.com>; Ori Kam <or...@nvidia.com>; NBU-Contact-
Thomas Monjalon (EXTERNAL) <tho...@monjalon.net>; Aman Singh
<aman.deep.si...@intel.com>; Yuying Zhang <yuying.zh...@intel.com>;
Andrew Rybchenko <andrew.rybche...@oktetlabs.ru>; dev@dpdk.org;
Raslan Darawsheh <rasl...@nvidia.com>
Subject: RE: [PATCH v1] ethdev: add direction info when creating
the transfer table

External email: Use caution opening links or attachments


Hi Rongwei,

PSB

On Tue, 13 Sep 2022, Rongwei Liu wrote:

Hi

BR
Rongwei

-----Original Message-----
From: Ivan Malov <ivan.ma...@oktetlabs.ru>
Sent: Tuesday, September 13, 2022 00:57
To: Rongwei Liu <rongw...@nvidia.com>
Cc: Matan Azrad <ma...@nvidia.com>; Slava Ovsiienko
<viachesl...@nvidia.com>; Ori Kam <or...@nvidia.com>;
NBU-Contact- Thomas Monjalon (EXTERNAL) <tho...@monjalon.net>;
Aman Singh <aman.deep.si...@intel.com>; Yuying Zhang
<yuying.zh...@intel.com>; Andrew Rybchenko
<andrew.rybche...@oktetlabs.ru>; dev@dpdk.org; Raslan Darawsheh
<rasl...@nvidia.com>
Subject: Re: [PATCH v1] ethdev: add direction info when creating
the transfer table

External email: Use caution opening links or attachments


Hi,

On Wed, 7 Sep 2022, Rongwei Liu wrote:

The transfer domain rule is able to match traffic wire/vf origin
and it means two directions' underlayer resource.


The point of fact is that matching traffic coming from some
entity like wire / VF has been long generalised in the form of

representors.

So, a flow rule with attribute "transfer" is able to match
traffic coming from either a REPRESENTED_PORT or from a

PORT_REPRESENTOR

(please find these items).


In customer deployments, they usually match only one direction
traffic in single flow table: either from wire or from vf.


Which customer deployments? Could you please provide detailed

examples?


We saw a lot of customers' deployment like:
1. Match overlay traffic from wire and do decap, then send to
specific

vport.

2. Match specific 5-tuples and do encap, then send to wire.
The matching criteria has obvious direction preference.


Thank you. My questions are as follows:

In (1), when you say "from wire", do you mean the need to match
packets arriving via whatever physical ports rather then matching
packets arriving from some specific phys. port?


^^

Could you please find my question above? Based on your understanding
of templates in async flow approach, an answer to this question may
help us find the common ground.

It means traffic arrived from physical ports (transfer_proxy role) or south

band per you concept.

Transfer proxy has nothing to do with physical ports. And I should stress out
that "south band" and the likes are NOT my concepts. Instead, I think that
direction designations like "south" or "north" aren't applicable when talking
about the embedded switch and its flow (transfer) rules.

Traffic from vport (not transfer_proxy) or north band per your concept won't

hit even if same packets.

Please see above. Transfer proxy is a completely different concept.
And I never used "north band" concept.

--


If, however, matching traffic "from wire" in fact means matching
packets arriving from a *specific* physical port, then for sure
item REPRESENTED_PORT should perfectly do the job, and the proposed
attribute is unneeded.

(BTW, in DPDK, it is customary to use term "physical port", not
"wire")

In (1), what are "vport"s? Please explain. Once again, I should
remind that, in DPDK, folks prefer terms "represented entity" /

"representor"

over vendor-specific terms like "vport", etc.

Vport is virtual port for short such as VF.


Thanks. As I say, term "vport" might be confusing to some readers, so
it'd be better to provide this explanation (about VF) in the commit
description next time.

Ack. Will add VF as an example.

As for (2), imagine matching 5-tuple traffic emitted by a VF / guest.
Could you please explain, why not just add a match item
REPRESENTED_PORT pointing to that VF via its representor? Doing so
should perfectly define the exact direction / traffic source. Isn't
that

sufficient?

Per my view, there is matching field and matching value difference.
Like IPv4 src_addr 1.1.1.1, 1.1.1.2. 1.1.1.3, will you treat it as
same or

different matching criteria?

I would like to call them same since it can be summarized like
1.1.1.0/30 REPRESENTED_PORT is just another matching item, no
essential

differences and it can't stand for direction info.

It looks like we're starting to run into disagreement here.
There's no "direction" at all. There's an embedded switch inside the
NIC, and there're (logical) switch ports that packets enter the switch from.

When the user submits a "transfer" rule and does not provide neither
REPRESENTED_PORT nor PORT_REPRESENTOR in the pattern, the

embedded

switch is supposed to match packets coming from ANY ports, be it VFs
or physical (wire) ports.

But when the user provides, in example, item REPRESENTED_PORT to
point to the physical (wire) port, the embedded switch knows exactly
which port the packets should enter it from.
In this case, it is supposed to match only packets coming from that
physical port. And this should be sufficient.
This in fact replaces the need to know a "direction".
It's just an exact specification of packet's origin.

There is traffic arriving or leaving the switch, so there is always direction,

implicit or explicit.

This does not contradict my thoughts above. "Direction" is *defined* by two
points (like in geometry): an initial point (the switch port through which a
packet enters the switch) and the terminal point (the match engine inside the
switch). If one knows these two points, no extra hints are required to specify
some "direction". Because direction is already represented by this "vector" of
sorts. That's why presence of the port match item in the pattern is absolutely
sufficient.

Good to see this. Thank for the information.


You're very welcome.

This update leverages the concept exactly defined by you: "an initial point 
(the switch port through which a
packet enters the switch)"


No, it doesn't seem so. Based on your explanations, it appears that
this update tries to refer to a "super set" of ports which have
something in common. For example, with attribute "wire_orig"
you seem to be trying to request that the rule match packets
arriving from wire through ANY of the phys.ports. So my point
is: why express an obvious match item as an attrbiute?

For example, nobody tries to replace match item IPv4 with
an attribute "is_ipv4". That would be strange, to say the
least. Why should the "vf_orig" case be an exception then?

If you think direction not good, we can change to other words like "initial 
port"/"origin port" etc.


As I explained multiple times, "direction" is rather obscure from the
viewpoint located inside the embedded switch. Yes, on non-transfer (VNIC)
level, there are *exactly* two directions: ingress and egress.
But, inside of the embedded switch (transfer rules), there can
be *multiple* various "directions", which are not even
directions, = they're traffic PATHs in fact.

Renaming to "intitial port" and "origin port" won't be helpful either
because, for users, it will be hard to figure out the difference
between the attribute and items PORT_REPRESENTOR / REPRESENTED_PORT.

If, however, you add new items instead of the attribute, the user
will likely see that the new items and the existing ones are
just alternative options = representor-based items help
to address exact ports (one rule - one port), whilst
your new items help to address super sets of ports
like "all wire ports" or "all guest ports".

So, the short of it:
1) these "wire_orig" / "vf_orig" are in fact yet another match criteria;
2) because of that, they should go to match items and not to attributes.


However, based on your later explanations, the use of precise port item is
simply inconvenient in your use case because you are trying to match traffic
from *multiple* ports that have something in common (i.e. all VFs or all wire
ports).

And, instead of adding a new item type which would serve exactly your needs,
you for some reason try to add an attribute, which has multiple drawbacks
which I described in my previous letter.

For transfer rules, there is a concept transfer_proxy.
It takes the switch ownership; all switch rules should be configured via

transfer_proxy.

Yes, such concept exists, but it's a don't care with regard to the problem that
we're discussing, sorry.
Furthermore, unlike "switch domain ID" (which is the same for all ethdevs
belonging to a given physical NIC board), nobody guarantees that it's only one
transfer proxy port. Some NIC vendors allows transfer rules to be added via
any ethdev port.

Does any flow rule leverage switchid already. Is it too obscure for end-user?


No, I'm not saying about flow rules. I'm explaining the logic which
application may use to identify which ethdevs are on which NICs.

Imagine a DPDK application which has two ethdevs instantiated:
one ethdev sits on top of the admin. PF (ethdev 0), the other
one sits on top of a low-privilege PF (ethdev 1).
In the latter case, it can also be a VF.

Both ethdev 0 and ethdev 1 belong to the same physical NIC board.

Now, what I'm trying to explain is the fact that "proxy"
behaviour may differ between various vendors:

- some vendors say that they can support managing "transfer" rules via
  any PFs / VFs. They do not require that some specific PF ethdev be
  used to do that. With such vendors, if the application makes a
  query "What's the proxy port ID for the ethdev 1?", it will
  get "The proxy port ID for ethdev 1 is 1" response.

- but other vendors cannot support the above workflow and they require
  that "transfer" rules be managed using some specific (admin) ethdev.
  If the application makes the same query here, it will get the
  following response: "The proxy port ID for ethdev 1 is 0".

So, given these explanations, it is incorrect to assume that
the proxy port ID for all ethdevs belonging to the same NIC
board will be the same. They simply may not be like this.

However, *regardless* of the two above scenarious and regardless
of vendor, for NICs which have embedded switch feature, when the
user tries to check the "switch domain ID" for ethdev 0 and
ethdev 1, they will get the same value. So, this should be
the right criterion for the application (not for flow
rules themselves) to decide which ethdev belongs to
which physical NIC board.


Image a logic switch with one PF and two VFs.
PF is the transfer proxy and VF belongs to the PF logically.
When receiving traffic from PF, we can say it comes into the logic switch.


That's correct.

When packet sent from VF (VF belongs to PF), so we can say traffic leaves

the switch.

That's not correct. Traffic sent from VF (for example, a guest VM is sending
packets) also *enters* the switch. PFs and VFs are in fact *separate* logical
ports of the embedded switch.


Item REPRESENTED_PORT indicates switch to match traffic sent from which

port, comes into, or leave switch.

That is not correct either. Item REPRESENTED_PORT tells the switch to match
packets which come into the switch FROM the logical port which is
represented by the given DPDK ethdev.

For example, if ethdev="E" is the *main* PF which is bound to physical port "P",
then item REPRESENTED_PORT with ethdev ID being set to "E" tells the switch
that only packet coming to NIC from *wire* via physical port "E" should match.

We can say it as one kind of packet metadata.


Kind of yes, but might be vendor-specific. No need to delve into this.

Like you said, DPDK always treat transfer to match any PORTs traffic.


Slight correction: it treats it this way until it sees an exact port item.
If the user provides REPRESENTED_PORT (or PORT_REPRESENTOR), it's no
longer *any* ports traffic, it's an exact port traffic. That's it.

When REPRESENTED_PORT is specified, the rules are limited to some

dedicated PORTs.

These rules match only packets arriving TO the embedded switch FROM the
said dedicated ports.

Other PORTs are ignored because metadata mismatching.


Kind of yes, correct.

Rules still have the capability to match ANY PORTS if metadata matched.


This statement is only correct for the cases when the user does NOT use
neither item REPRESENTED_PORT nor item PORT_REPRESENTOR.


This update will allow user to cut the other PORTs matching capabilities.


As I explained, this is exactly what items PORT_REPRESENTOR and
REPRESENTED_PORT do. No need to have an extra attribute.

If the user adds item REPRESENTED_PORT with ethdev_id="E", like in the
above example, to match packets entering NIC via the physical port "P", then
this rule will NOT match packets entering NIC from other points. For example,
packets transmitted by a virtual machine via a VF will not match in this case.

Port id depends on the attach sequence.


Unfortunately, this is hardly a good argument because flow rules are
supposed to be inserted based on the run-time packet learning. Attach
sequence is a don't care here.

Also please mind that, although I appreciate your explanations
here, on the mailing list, they should finally be added to the
commit message, so that readers do not have to look for them elsewhere.

We have explained the high possibility of single-direction matching, right?


Not quite. As I said, it is not correct to assume any "direction",
like in geographical sense ("north", "south", etc.). Application has
ethdevs, and they are representors of some "virtual ports" (in your
terminology) belonging to the switch, for example, VFs, SFs or physical

ports.


The user adds an appropriate item to the pattern (REPRESENTED_PORT),
and doing so specifies the packet path which it enters the switch.

It' hard to list all the possibilities of traffic matching preferences.


And let's say more: one need never do this. That's exactly the reason
why DPDK has abandoned the concept of "direction" in *transfer* rules
and switched to the use of precise criteria (REPRESENTED_PORT, etc.).

As far as I know, DPDK changes "transfer ingress" to "transfer", so it' more

clear that transfer can match both directions (both ingress and egress).

Not quite. DPDK has abandoned the use of "ingress / egress" in "transfer"
rules because "ingress" and "egress" are only applicable on the VNIC level. For
example, there is a PF attached to DPDK application:
packets that the application receives through this ethdev, are ingress, and
packets that it transmits (tx_burst) are egress.

I can explain in other words. Imagine yourself standing *inside* a room which
only has one door. When someone enters the room, it's "ingress", when
someone leaves, it's "egress". It's relative to your viewpoint.
In this example, such a room represents a VNIC / ethdev.

And now imagine yourself standing *outside* of another room / auditorium
which has multiple doors / exits. You're standing near some particular exit "A"
(VNIC / ethdev), but people may enter this room via another door "B" and then
leave it via yet another door "C". In this case, from your viewpoint, this 
traffic
cannot be considered neither ingress nor egress. Because these people do not
approach you.

Like in this example, embedded switch is like a large auditorium with many-
many doors / exits. And there can be many-many
directions: packet can enter the switch via phys. port "P1"
and then leave it via another phys. port "P2". Or it can enter the switch via
phys. port and the leave it via VF's logical port (to be delivered to a guest
machine), or a packet can travel from one VF to another one.

There's no PRE-DEFINED direction like "north to south" or "east to west".
And this explains why it's very undesirable to use term "direction".

REPRESENTED_PORT is the evolution of "port_id", I think, it' only one kind of

matching items.

Yes. But nobody prevents you from defining yet another match item which will
be able to refer to a *group* of ports which have something in common (i.e.
"all guest ports of this switch"
pointing to all logical ports currently attached to virtual machines / guests, 
or
"all wire ports of this swtich").


For large scale deployment like 10M rules, if we can save resources

significantly by introducing direction, why not?

I do not deny the fact that you have a use case where resources can be saved
significantly if you give the PMD some extra knowledge when creating a flow
table / pattern template. That's totally OK. What I object is the very
implementation and the use of term "direction". If you add new item types
(like above), then, when you create an async table 1 pattern template, you will
have item ANY_WIRE_PORTS, and, for table 2 pattern template, you'll have
item ANY_GUEST_PORTS.
As you see, the two pattern templates now differ because the match criteria
use different items.


Again, async API:
1. pattern template A
2. action template B
3. table C with pattern template A + action template B.
4. rule D, E, F...
The specified REPRESENTED_PORT is provided in rules (D, E, F...) not pattern

template A or action template B or table C.

Resources may be allocated early at step 3 since table' rule_nums property.


No, item REPRESENTED_PORT *can* be provided inside pattern template A,
but, as you pointed out earlier, the problem is that you can't distinguish
different pattern templates which have this item, because pattern templates
know nothing about *exact* port IDs and only know item MASKS. Yes, I agree
that in your case such problem exists, but, as I say above, it can be solved by
adding new item types: one for referring to all phys. ports of a given NIC and
another one for pointing to a group of current guest users (VFs).

The underlay is the one we have met for now.

Introduce one new member transfer_mode into rte_flow_attr to
indicate the flow table direction property: from wire, from vf
or bi-direction(default).


AFAIK, 'rte_flow_attr' serves both traditional flow rule
insertion and asynchronous (table) approach. The patch adds the
attributes to generic 'rte_flow_attr' but, for some reason, ignores non-

table rules.

Sync API uses one rule to contain everything. It' hard for PMD to
determine

if this rule has direction preference or not.

Image a situation, just for an example:
1. Vport 1 VxLAN do decap send to vport 2.     1 million scale
2. Vport 0 (wire) VxLAN do decap send to vport 3.   1 hundred scale.
1 and 2 share the same matching conditions (eth / ipv4 / udp /
vxlan /...), so

sync API consider them share matching determination logic.

It means "2" have 1M scale capability too. Obviously, it wastes a
lot of

resources.

Strictly speaking, they do not share the same match pattern.
Your example clearly shows that, in (1), the pattern should request
packets coming from "vport 1" and, in (2), packets coming from "vport 0".

My point is simple: the "vport" from which packets enter the
embedded switch is ALSO a match criterion. If you accept this,
you'll see: the matching conditions differ.

See above.
In this case, I think the matching fields are both "port_id +
ipv4_vxlan". They

are same.

Only differs with values like vni 100 or 200 vice versa.


Not quite. Look closer: you use *different* port IDs for (1) and (2).
The value of "ethdev_id" field in item REPRESENTED_PORT differs.


In async API, there is pattern_template introduced. We can mark "1"
to use

pattern_tempate id 1 and "2" to use pattern_template 2.

They will be separated from each other, don't share anymore.


Consider an example. "Wire" is a physical port represented by PF0
which, in turn, is attached to DPDK via ethdev 0. "VF" (vport?) is
attached to guest and is represented by a representor ethdev 1 in DPDK.

So, some rules (template 1) are needed to deliver packets from "wire"
to "VF" and also decapsulate them. And some rules (template 2) are
needed to deliver packets in the opposite direction, from "VF"
to "wire" and also encapsulate them.

My question is, what prevents you from adding match item
REPRESENTED_PORT[ethdev_id=0] to the pattern template 1 and
REPRESENTED_PORT[ethdev_id=1] to the pattern template 2?

As I said previously, if you insert such item before eth / ipv4 /
etc to your match pattern, doing so defines an *exact* direction / source.

Could you check the async API guidance? I think pattern template
focusing

on the matching field (mask).

"REPRESENTED_PORT[ethdev_id=0] " and

"REPRESENTED_PORT[ethdev_id=1] "are the same.

1. pattern  template:  REPRESENTED_PORT mask 0xffff ...
2. action template: action1 / actions2. / 3. table create with
pattern_template plus action template..
REPRESENTED_PORT[ethdev_id=0]  will be rule1:  rule create

REPRESENTED_PORT port_id is 0 / actions ....

REPRESENTED_PORT[ethdev_id=1]  will be rule2:  rule create

REPRESENTED_PORT port_id is 1 / actions ....

OK, so, based on this explanation, it appears that you might be
looking to refer
to:
a) a *set* of any physical (wire) ports
b) a *set* of any guest ports (VFs)

Great, looks we are more and more closer to the agreement.


Looks so.

You chose to achieve this using an attribute, but:

1) as I explained above, the use of term "direction" is wrong;
    please hear me out: I'm not saying that your use case and
    your optimisation is wrong: I'm saying that naming for it
    is wrong: it has nothing to do with "direction";

Do you have any better naming proposal?


As I said, what you are trying to achieve using a new attribute would be way
better to achieve using new pattern items which can be easily told one from
another in PMD when pre-allocaing resources for different async flow tables.

So, I don't have any proposal for *attribute* naming.
What I propose is to consider new items instead.

2) while naming a *set* of wire ports as "wire_orig" might be OK,
    sticking with term "vf_orig" for a *set* of guest ports is
    clearly not, simply because the user may pass another PF
    to a guest instead of passing a VF; in other words,
    a better term is needed here;

Like you said, vport may contain VF, SF etc. vport_orgin is on the logic switch

perspective.

Any proposal is welcome.


The problem is, vport can be easily confused with a slightly more generic
"lport" (embedded switch's "logical port"), and, logical ports, in turn, are not
confined to just VFs or PFs. For example, physical (wire) ports are ALSO logical
ports of the switch.

3) since it is possible to plug multiple NICs to a DPDK application,
    even from different vendors, the user may end up having multiple
    physical ports belonging to different physical NICs attached to
    the application; if this is the case, then referring to a *set*
    of wire ports using the new attribute is ambiguous in the
    sense that it's unclear whether this applies only to
    wire ports of some specific physical NIC or to the
    physical ports of *all* NICs managed by the app;

Not matter how many NICs has been probed by the DPDK, there is always

switch/PF/VF/SF.. concept.

Correct.

Each switch must have an owner identified by transfer_proxy(). Vport (VF/SF)

can't cross switch in normal case.

No. That is not correct. This is tricky, but please hear me out: an individual 
NIC
board (that is, a given *switch*) is identified only by its switch domain ID. 
As I
explained above, "transfer proxy" is just a technical hint for the applcation to
indicate an ethdev through which "transfer" rules must be managed. Not all
vendors support this concept (and they are not obliged to support it).

The traffic comes from one NIC can't be offloaded by other NICs unless

forwarded by the application.

Right, but forwarding in software (inside DPDK application) is out of scope with
regard to the problem that we're discussing.

If user use new attribute to cut one side resource, I think user is smart

enough to management the rules in different NICs.

As I explained above, I do not deny the existence of the problem that your
patch is trying to solve. Now it looks like we're on the same page with regard
to understanding the fact that what you're trying to do is to introduce a match
criterion that would refer to a GROUP of similar ports. In my opinion, this is
not an *attribute*, it's a *match criterion*, and it should be implemented as
two new items.

Having two different item types would perfectly fit the need to know the
difference between such "directions" (as per your terminology) early enough,
when parsing templates.

No default behavior changed with this update.

4) adding an attribute instead of yet another pattern item type
    is not quite good because PMDs need to be updated separately
    to detect this attribute and throw an error if it's not
    supported, whilst with a new item type, the PMDs do not
    need to be updated = if a PMD sees an unsupported item
    while traversing the item with switch () { case }, it
    will anyway throw an error;

PMD also need to check if it supports new matching item or not, right?
We can't assume NIC vendor' PMD implementation, right?


No-no-no. Imagine a PMD which does not support "transfer" rules.
In such PMD, in the flow parsing function one would have:

if (!!attr->transfer) {
     print_error("Transfer is not supported");
     return EINVAL;
}

If you add a new attribute, then PMDs which are NOT going to support it need
to be updated to add similar check.
Otherwise, they will simply ignore presence / absence of the attribute in the
rule, and validation result will be unreliable.

Yes, if this attribute is 0x0, then indeed behaviour does nto change. But what 
if
it's 0x1 or 0x2?
PMDs that do not support these values must somehow reject such rules on
parsing.

However, this problem does not manifest itself when parsing items. Typially, in
a PMD, one would have:

switch (item->type) {
     case RTE_FLOW_ITEM_TYPE_VOID:
         break;

     case RTE_FLOW_ITEM_TYPE_ETH:
         /* blah-blah-blah */
         break;

     default:
         return ENOTSUP;
}

Are you assuming all PMDs will be implemented in the upper style?


One may take a look at the existing PMDs. It's open source after all.

When one has an array of items of unknown count which is
END-terminated, then, obviously, the PMD has to traverse
it one way or another. If it stubles upon an unknown
item, it will have nothing to do but to throw an error.

This new field targets async API which was added recently. No impact on sync 
API.


Rongwei, I see your point. The problem with it, however, is that even
if you describe it in comments, the code won't prevent non-sync API
from seeing this attribute in "struct rte_flow_attr".

As I say, "struct rte_flow_attr" has been here for ages.
When one adds a flow rule in a sync way, they fill out
the very same structure. And the user may set this new
argument to non-zero by mistake. Yes, you may argue
that the app developer should be smart enough to
read your comment before the struct member which
says that this field is for a-sync only. Right.
But that's not the only scenario. The field may
become non-zero because of some other mistake in
the program which, for example, leads to the
struct memory being corrupted in one way or
another. That's why the PMD has to validate flow rules...

So, the PMD must detect this inconsistency somehow and throw an error.
With your approach (attribute), the PMDs have to be updated to have
these checks. With the item approach that I suggest, updating the
PMDs is obviously not needed. Am I missing something? Let's discuss.

I don't predict any effort on the existing PMD behavior.


I see your point. But how is this expressed in code?
As I explain above, consistency checks are what
flow validate API is for. New argument means
new checks. That's it.

But agree with you: we should emphasize it' only for async mode.


It's better to express this in code. So that the problem (if any)
can be detected programmatically and not just from reading comments.

From my point of view, the easiest way to have this done is to

add items instead of attributes, = no need to update PMDs.


So, if you introduce two new item types to solve your problem, then you won't
have to update existing PMDs. If the vendor wants to support the new items
(say, MLX or SFC), they'll update their code to accept the items. But other
vendors will not do anything. If the user tries to pass such an item to a vendor
which doesn't support the feature, the "default" case will just throw an error.

This is what I mean when pointing out such difference between adding an
attribute VS adding new item types.

5) as in (4), a new attribute is not good from documentation
    standpoint; plase search for "represented_port = Y" in
    documentation = this way, all supported items are
    easily defined for various NIC vendors, but the
    same isn't true for attributes = there is no
    way to indicate supported attributes in doc.

If points (1 - 5) make sense to you, then, if I may be so bold, I'd
like to suggest that the idea of adding a new attribute be abandoned.
Instead, I'd like to suggest adding new items:

(the names are just sketch, for sure, it should be discussed)

ANY_PHY_PORTS { switch_domain_id }
  = match packets entering the embedded switch from *whatever*
    physical ports belonging to the given switch domain

How many PHY_PORTS can one switch have, per your thought? Can I treat

the PHY_PORTS as the { switch_domain_id } owner as transfer_proxy()?

A single physical NIC board is supposed to have a single embedded switch
engine. Hence, if the NIC board has, in example, two or four physical ports,
these will be the physical ports of the switch. That's it.

As for the transfer proxy, please see my explanations above.
It's not *always* reliable to tell whether two given ethdevs belong to the same
physical NIC board or not.

Switch domain ID is the right criterion (for applications).

ANY_GUEST_PORTS { switch_domain_id }
  = match packets entering the embedded switch from *whatever*
    guest ports (VFs, PFs, etc.) belonging to the given
    switch domain

The field "switch_domain_id" is required to tell one physical board /
vendor from another (as I explained in point (3)).
The application can query this parameter from ethdev's switch info:
please see "struct rte_eth_switch_info".

What's your opinion?

How can we handle ANY_PHY_PORTS/ ANY_GUEST_PORTS ' relationship

with REPRESENTED_PORT if conflicts?

Need future tuning.


And if you carry on with "vf_orig" / "wire_orig" approach, you will inevitably
have the very same problem: possible conflict with items like
REPRESENTED_PORT. So does it matter? Yes, checks need to be done by PMDs
when parsing patterns.

Like I said before,  offloaded rules can't cross different NIC vendor'

"switch_domain_id".

If user probes multiple NICs in one application, application should take care

of packet forwarding.

Also application should be aware which ports belong to which NICs.


Yes, perhaps, domain ID is not needed in the new items.
But the application still must keep track of switch domain IDs itself so it 
knows
which rules to manage via which ethdevs.

Any other opinions?

ANY_PHY_PORTS/ ANY_GUEST_PORTS looks like a super set of ports.


So does the new attribute, doesn't it?

This will come another challenge: "why can't we use REPRESENTED_PORT  with mask" or 
"combine several REPRESENTED_PORT together"?


This problem has been here for many other items, including now deprecated
items PF, VF and PHY_PORT. Yes, theoretically, when the PMD looks through
the pattern, it has to check that its items do not overlap / contradict.
That's kind of OK, isn't it? The PMD has to check things after all...

For example, no one prevents user from submitting a pattern
with several adjacent items ETH in it. The PMD is supposed
to turn such request down.

For example, the diff below adds the attributes to "table"
commands in testpmd but does not add them to regular (non-table)
commands like "flow create". Why?


"table" command limits pattern_template to single direction or
bidirection

per user specified attribute.

As I say above, the same effect can be achieved by adding item
REPRESENTED_PORT to the corresponding pattern template.

See above.

"rule" command must tight with one "table_id", so the rule will
inherit the

"table" direction property, no need to specify again.

You migh've misunderstood. I do not talk about "rule" command
coupled with some "table". What I talk about is regular, NON-async
flow insertion commands.

Please take a look at section "/* Validate/create attributes. */"
in file "app/test-pmd/cmdline_flow.c". When one adds a new flow
attribute, they should reflect it the same way as VC_INGRESS,

VC_TRANSFER, etc.


That's it.

We don't intend to pass this to sync API. The above code example is
for sync

API.

So I understand. But there's one slight problem: in your patch, you
add the new attributes to the structure which is *shared* between
sync and async use case scenarios. If one adds an attribute to this
structure, they have to provide accessors for it in all sync-related
commands in testpmd, but your patch does not do that.

Like the title said, "creating transfer table" is the ASYNC operation.
We have limited the scope of this patch. Sync API will be another story.
Maybe we can add one more sentence to emphasize async API again.


No-no-no. There might be slight misunderstanding. I understand that you are
limiting the scope of your patch by saying this and this.
That's OK. What I'm trying to point out is the fact that your patch nevertheless
touches the COMMON part of the flow API which is shared between two
approaches (sync and async).

Yeah, you are right, we should emphasize it for async API not sync in the code 
and comments.


Imagine a reader that does not know anything about the async approach.
He just opens the file in vim and goes directly to struct rte_flow_attr.
And, over there, he sees the new attribute "wire_orig". He then immediately
assumes that these attributes can be used in testpmd. Now the reader opens
testpmd and tries to insert a flow rule using the sync approach:

flow create priority 0 transfer vf_orig pattern / ... / end actions drop


This is wrong statement.
If user has no idea with cmdline usage, he should rely on "tab indication' not 
something by guessing.

The command prefix "flow" bifurcated now to sync and async now, user may use 
any keyword combinations.
He will get "argument error" if it's not good unless he knows what' he is doing.
Again:  we should emphasize it's only for async API only.


OK, even if this example is not good enough, I still believe that
it is not right to introduce new match criteria in the form of
rule attributes. Match criteria belong in the pattern.

And doing so will be a failure, because your patch does not add the new
attribute keyword to sync flow rule syntax parser. That's it.

Once again, I should ephasize: the reader MAY know nothing about the async
approach. But if the attribute is present in "struct rte_flow_attr", it
immediately means that it is available everywhere. Both sync and async.

So, with this in mind, your attempt to limit the scope of the patch to 
async-only
rules looks a little bit artificial. It's not correct from the *formal* 
standpoint.

In other words, it is wrong to assume that "struct rte_flow_attr"
only applies to async approach. It had been introduced long before
the async flow design was added to DPDK. That's it.


But, as I say, I still believe that the new attributes aren't needed.

I think we are not at the same page for now. Can we reach agreement
on the same matching criteria first?

It helps to save underlayer memory also on insertion rate.


Which memory? Host memory? NIC memory? Term "underlayer" is

vague.

I suggest that the commit message be revised to first explain how
such memory is spent currently, then explain why this is not
optimal and, finally, which way the patch is supposed to improve
that. I.e. be more

specific.


For large scalable rules, HW (depends on implementation) always
needs

memory to hold the rules' patterns and actions, either from NIC or
from

host.

The memory footprint highly depends on "user rules' complexity",
also diff

between NICs.

~50% memory saving is expected if one-direction is cut.


Regardless of this talk, this explanation should probably be
present in the commit description.

This number may differ with different NICs or implementation. We
can't say

it for sure.

Not an exact number, of course, but a brief explanation of:
a) what is wrong / not optimal in the current design;

Please check the commit log, transfer have the capability to match bi-

direction traffic no matter what ports.

b) how it is observed in customer deployments;

Customer have the requirements to save resources and their offloaded rules

is direction aware.

c) why the proposed patch is a good solution.

New attributes provide the way to remove one direction and save underlayer

resource.

All of the above can be found in the commit log.


I understand all of that, but my point is, the existing commit message is way
too brief. Yes, it mentions that SOME customers have SOME deployments, but
it does not shed light on which specifics these deployments have. For example,
back in the day, when items PORT_REPRESENTOR and REPRESENTED_PORT
were added, the cover letter for that patch series provided details of
deployment specifics (application: OvS, scenario: full offload rules).

So, it's always better to expand on such specifics so that the reader has full
picture in their head and doesn't need to look elsewhere.
Not all readers of the commit message will be happy to delve into our
discussions on the mailing list to get the gist.

It' approach diverse. Pattern item approach will attract another discussion 
thread, right?


As I said, match criteria belong in flow pattern. I recognise the
importance of the problem that you're looking to solve. It's very
good that you care to address it, but what this patch tries to do
is to add more match criteria in the form of new attributes with
rather questionable names... There's a room for improvement.

When I say that new features should not confuse readers, I mean
a very basic thing: readers know that match criteria all sit
in the pattern. And they refer to the pattern item enum in
the code and in documentation to learn about criteria,
while "struct rte_flow_attr" is an unusual place from
which to learn about match criteria.

We should get a conclusion and reflect in the commit changes&logs, and it's 
easy for others to absorb.


Yes, but before we get to that, perhaps it pays to hear
more feedback from other reviewers. Thomas? Ori? Andrew?

By default, the transfer domain is bi-direction, and no behavior

changes.


1. Match wire origin only
 flow template_table 0 create group 0 priority 0 transfer wire_orig...
2. Match vf origin only
 flow template_table 0 create group 0 priority 0 transfer vf_orig...

Signed-off-by: Rongwei Liu <rongweil at nvidia.com>
---
app/test-pmd/cmdline_flow.c                 | 26

+++++++++++++++++++++

doc/guides/testpmd_app_ug/testpmd_funcs.rst |  3 ++-
lib/ethdev/rte_flow.h                       |  9 ++++++-
3 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/cmdline_flow.c
b/app/test-pmd/cmdline_flow.c index 7f50028eb7..b25b595e82
100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -177,6 +177,8 @@ enum index {
      TABLE_INGRESS,
      TABLE_EGRESS,
      TABLE_TRANSFER,
+     TABLE_TRANSFER_WIRE_ORIG,
+     TABLE_TRANSFER_VF_ORIG,
      TABLE_RULES_NUMBER,
      TABLE_PATTERN_TEMPLATE,
      TABLE_ACTIONS_TEMPLATE,
@@ -1141,6 +1143,8 @@ static const enum index next_table_attr[] =

      TABLE_INGRESS,
      TABLE_EGRESS,
      TABLE_TRANSFER,
+     TABLE_TRANSFER_WIRE_ORIG,
+     TABLE_TRANSFER_VF_ORIG,
      TABLE_RULES_NUMBER,
      TABLE_PATTERN_TEMPLATE,
      TABLE_ACTIONS_TEMPLATE,
@@ -2881,6 +2885,18 @@ static const struct token token_list[] = {
              .next = NEXT(next_table_attr),
              .call = parse_table,
      },
+     [TABLE_TRANSFER_WIRE_ORIG] = {
+             .name = "wire_orig",
+             .help = "affect rule direction to transfer",


This does not explain the "wire" aspect. It's too broad.

+             .next = NEXT(next_table_attr),
+             .call = parse_table,
+     },
+     [TABLE_TRANSFER_VF_ORIG] = {
+             .name = "vf_orig",
+             .help = "affect rule direction to transfer",


This explanation simply duplicates such of the "wire_orig".
It does not explain the "vf" part. Should be more specific.

+             .next = NEXT(next_table_attr),
+             .call = parse_table,
+     },
      [TABLE_RULES_NUMBER] = {
              .name = "rules_number",
              .help = "number of rules in table", @@ -8894,6
+8910,16 @@ parse_table(struct context *ctx, const struct token
+*token,
      case TABLE_TRANSFER:
              out->args.table.attr.flow_attr.transfer = 1;
              return len;
+     case TABLE_TRANSFER_WIRE_ORIG:
+             if (!out->args.table.attr.flow_attr.transfer)
+                     return -1;
+             out->args.table.attr.flow_attr.transfer_mode = 1;
+             return len;
+     case TABLE_TRANSFER_VF_ORIG:
+             if (!out->args.table.attr.flow_attr.transfer)
+                     return -1;
+             out->args.table.attr.flow_attr.transfer_mode = 2;
+             return len;
      default:
              return -1;
      }
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 330e34427d..603b7988dd 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -3332,7 +3332,8 @@ It is bound to

``rte_flow_template_table_create()``::


  flow template_table {port_id} create
      [table_id {id}] [group {group_id}]
-       [priority {level}] [ingress] [egress] [transfer]
+       [priority {level}] [ingress] [egress]
+       [transfer [vf_orig] [wire_orig]]


Is it correct? Shouldn't it rather be [transfer] [vf_orig]
[wire_orig] ?

      rules_number {number}
      pattern_template {pattern_template_id}
      actions_template {actions_template_id} diff --git
a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index
a79f1e7ef0..512b08d817 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -130,7 +130,14 @@ struct rte_flow_attr {
       * through a suitable port. @see rte_flow_pick_transfer_proxy().
       */
      uint32_t transfer:1;
-     uint32_t reserved:29; /**< Reserved, must be zero. */
+     /**
+      * 0 means bidirection,
+      * 0x1 origin uplink,


What does "uplink" mean? It's too vague. Hardly a good term.


I believe this comment should be reworked, in case the idea of having
an extra attribute persists.

+      * 0x2 origin vport,


What does "origin vport" mean? Hardly a good term as well.


I still believe this explanation is way too brief and needs to be
reworked to provide more details, to define the use case for the attribute

more specifically.

+      * N/A both set.


What's this?


The question stands.

+      */
+     uint32_t transfer_mode:2;
+     uint32_t reserved:27; /**< Reserved, must be zero. */
};

/**
--
2.27.0


Since the attributes are added to generic 'struct rte_flow_attr',
non-table
(synchronous) flow rules are supposed to support them, too. If
that is indeed the case, then I'm afraid such proposal does not
agree with the existing items PORT_REPRESENTOR and

REPRESENTED_PORT.

They

do exactly the same thing, but they are designed to be way more
generic. Why

not use them?

The question stands.


Ivan


Ivan


Thank you.


Thanks,
Ivan

RE: [PATCH v1] ethdev: add direction info when creating the transfer table

Reply via email to