Re: [dpdk-dev] [PATCH v3 1/5] ethdev: add API to negotiate delivery of Rx meta data

Ivan Malov Mon, 04 Oct 2021 04:39:22 -0700

Hi Ori,

On 04/10/2021 09:56, Ori Kam wrote:

Hi Ivan,

-----Original Message-----
From: Ivan Malov <ivan.ma...@oktetlabs.ru>
Sent: Monday, October 4, 2021 2:50 AM
Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery of Rx meta
data

Hi Ori,

On 04/10/2021 00:04, Ori Kam wrote:

Hi Ivan,

Sorry for the long review.

-----Original Message-----
From: Ivan Malov <ivan.ma...@oktetlabs.ru>
Sent: Sunday, October 3, 2021 8:30 PM
Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery of
Rx meta data

Hi Ori,

On 03/10/2021 14:01, Ori Kam wrote:

Hi Ivan,

-----Original Message-----
From: Ivan Malov <ivan.ma...@oktetlabs.ru>
Sent: Sunday, October 3, 2021 12:30 PM data

Hi Ori,

Thanks for reviewing this.


No problem.

On 03/10/2021 10:42, Ori Kam wrote:

Hi Andrew and Ivan,

-----Original Message-----
From: Andrew Rybchenko <andrew.rybche...@oktetlabs.ru>
Sent: Friday, October 1, 2021 9:50 AM
Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery
of Rx meta data

On 9/30/21 10:07 PM, Ivan Malov wrote:

Hi Ori,

On 30/09/2021 17:59, Ori Kam wrote:

Hi Ivan,


[Snip]

Good point so why not use the same logic as the metadata and register

it?

Since in any case, this is something in the mbuf so maybe this
should be the

answer?

I didn't catch your thought. Could you please elaborate on it?


The metadata action just like the mark or flag is used to give
application data that was set by a flow rule.
To enable the metadata the application must register the metadata field.
Since this happens during the creation of the mbuf it means that it
must be created before the device start.

I understand that the mark and flag don't need to be registered in the
mbuf since they have saved space but from application point of view
there is no difference between the metadata and mark, so why does
negotiate function doesn't handle the metadata?

I hope this is clearer.


Thank you. That's a lot clearer.

I inspected struct rte_flow_action_set_meta as well as
rte_flow_dynf_metadata_register(). The latter API doesn't require that
applications invoke it precisely before adapter start. It says "must be called
prior to use SET_META action", and the comment before the structure says
just "in advance". So, at a bare minimum, the API contract could've been
made more strict with this respect. However, far more important points are
as follows:


Agree, that doc should be updated but by definition this must be set before mbuf
creation this means before device start.

1) This API enables delivery of this "custom" metadata between the PMD
and the application, whilst the API under review, as I noted before,
negotiates delivery of various kinds of metadata between the NIC and the
PMD. These are two completely different (albeit adjacent) stages of packet
delivery process.

They are exactly alike also in the metadata case the registertion does two 
things:
Saves a place for the info in the mbuf and tells the PMD that it should 
configure the NIC
to supply this information upon request.

Looking at rte_flow_dynf_metadata_register() implementation, it doesn'tseem to notify the PMD of the new field directly. Yes, the PMD willfinally know, but at that point it won't be able to reject the field.It's one-sided communication in fact.

Even in your PMD assuming that it can support the metadata, you will need to 
configure
it otherwise when the application will request this data using a rule you will 
be at the
same spot you are now with the mark.

Right, but as I said, the primary concern is to configure delivery ofmetadata from the NIC HW to the PMD. It's not about mbuf dynfields.

2) This API doesn't negotiate anything with the PMD. It doesn't interact with
the PMD at all. It just reserves extra room in mbufs for the metadata field
and exits.

3) As a consequence of (3), the PMD can't immediately learn about this field
being enabled. It's forced to face this fact at some deferred point. If the
PMD, for instance, learns about that during adapter start and if it for some
reason decides to deny the use of this field, it won't be able to convey its
decision to the application. As a result, the application will live in the wrong
assumption that it has successfully enabled the feature.

4) Even if we add similar APIs to "register" more kinds of metadata (flag,
mark, tunnel ID, etc) and re-define the meaning of all these APIs to say that
not only they enable delivery of the metadata between the PMD and the
application but also enable the HW transport to get the metadata delivered
from the NIC to the PMD itself, we won't be able to use this set of APIs to
actually *negotiate* something. The order of invocations will be confusing to
the application. If the PMD can't combine some of these features, it won't be
able to communicate this clearly to the application. It will have to silently
disregard some of the "registered" features. And this is something that we
probably want to avoid. Right?

But I tend to agree that the API under review could have one more (4th) flag
to negotiate delivery of this "custom" metadata from the NIC to the PMD. At
the same time, enabling delivery of this data from the PMD to the application
will remain in the responsibility domain of
rte_flow_dynf_metadata_register().


I agree and think this is the best solution.


Thank you.

Say, you insert a flow rule to mark some packets. The NIC,
internally (in the
e-switch) adds the mark to matching packets. Yes, in the boundaries
of the NIC HW, the packets bear the mark on them. It has been set,
yes. But when time comes to *deliver* the packets to the host, the
NIC (at least, in net/sfc
case) has two options: either provide only a small chunk of the
metadata for each packet *to the host*, which doesn't include mark
ID, flag and RSS hash, OR, alternatively, provide the full set of
metadata. In the former option, the mark is simply not delivered.
Once again: it *has been set*, but simply will not be *delivered to
the

host*.


So, this API is about negotiating *delivery* of metadata. In pure
technical sense. And the set of flags that this API returns
indicates which kinds of metadata the NIC will be able to deliver

simultaneously.


For example, as I understand, in the case of tunnel offload, MLX5
claims Rx mark entirely for tunnel ID metadata, so, if an
application requests "MARK | TUNNEL_ID" with this API, this PMD
should probably want to respond with just "TUNNEL_ID". The
application will see the response and realise that, even if it adds
its *own* (user) action MARK to a flow and if the flow is not
rejected by the PMD, it won't be able to see the mark in the
received mbufs (or the mark will be

incorrect).

So what should the application do if on some flows it wants MARK and
on

other FLAG?

You mentioned flows, so I'd like to stress this out one more time:
what this API cares about is solely the possibility to deliver
metadata between the NIC and the host. The host == the PMD (*not*

application).


I understand that you are only talking about enabling the action,
meaning to let the PMD know that at some point there will be a rule
that will use the mark action for example.
Is my understanding correct?


Not really. The causal relationships are as follows. The application comes to
realise that it will need to use, say, action MARK in flows.
This, in turn, means that, in order to be able to actually see the mark in
received packets, the application needs to ensure that a) the NIC will be able
to deliver the mark to the PMD and b) that the PMD will be able to deliver
the mark to the application. In particular, in the case of Rx mark, (b) doesn't
need to be negotiated = field "mark" is anyway provisioned in the mbuf
structure, so no need to enable it. But (a) needs to be negotiated. Hence this
API.

Please see my above comment I think we both agree.

Agree to have the 4-th flag in the new API to cover this "custom / rawmetdata" delivery? Personally, I tend to agree, but maybe Andrew canexpress his opinion, too.

I don't understand your last comment about host == PMD since at the
end this value should be given to the application.


Two different steps, Ori, two different steps. The first one is to deliver the
mark from the NIC to the PMD. And the second one is to deliver the mark
from the PMD to the application. As you might understand, mbufs get filled
out on the second step. That's it.

   From DPDK viewpoint both of them can't be shared on the same rule
(they are using the same space in mbuf) so the application will never
ask for both of them in the same rule but he can on some rules ask for
mark while on other request for FLAG, even in your code you added

both

of them.


So what should the PMD return if it can support both of them just not
at the same rule?


Please see above. This is not about rules. This is not about the way how

flag

and mark are presented *by* the PMD *to* the application in mbufs.
Simultaneous use of actions FLAG and MARK in flows must be ruled out

by

rte_flow_validate() / rte_flow_create(). The way how flag and mark are
*represented* in mbufs belongs in mbuf library responsibility domain.

Consider the following operational sequence:

1) The NIC has a packet, which has metadata associated with it;
2) The NIC transfers this packet to the host;
3) The PMD sees the packet and its metadata;
4) The PMD represents whatever available metadata in mbuf format.

Features negotiated by virtue of this API (for instance, FLAG and MARK)
enable delivery of these kinds of metadata between points (2) and (3).

And the problem of flag / mark co-existence in mbufs sits at point (4).

-> Completely different problems, in fact.


Agree.


One option is to document that the supported values are not per rule
but for the entire port. For example in the example you gave MLX5 will
support mark + flag but will not support mark + tunnel.


Yes, for the port. Flow rules are a don't care to this API.


Also considering your example, the negotiation may result in subpar

result.

taking your example the PMD returned  TUNNEL_ID maybe application
would prefer to have the mark and not the TUNNEL_ID. I understand

that

application can check and try again with just the MARK.


Exactly. The Application can repeat negotiation with just MARK. Is there

any

problem with that?


I understand that the application can negotiate again and again.
I just don't like that the PMD has logic and selects what he thinks will be

best.


I wanted to suggest that the PMD will just tell what are the conflicts and

the application

will negotiate again based on its logic.


Well, I'm not saying that letting the PMD decide on the optimal feature
subset is the only reasonable MO. But it simplifies the negotiation
procedure a lot. Conveying conflicts and feature inter-dependencies to
the application might be rather complex and prone to errors.

At this point I believe it's important to clarify: the original intent
is to assume that the PMD will first consider enabling all requested
features. Only in the case when it fails to do so should it come up with
the optimal subset.


I understand my issue is the the later case and how can PMD know what is
the optimal subset.

You are inserting logic to the PMD, maybe the function should just
fail maybe returning the conflicting items?


Why return conflicting items? The optimal subset (from the PMD's
perspective) should be returned. It's a silver lining. In the end, the

application

can learn which features can be enabled and in what combinations. And it
can rely on the outcome of the negotiation process.

That is my point this is PMD perspective, not the application.
how can a PMD define an optimal subset? How can it know what is more
important to the application?


How does "ls" command know the optimal sort mode? Why does it prefer
sorting by name over sorting by date? Thanks to its options, it allows
the user to express their own preference. So does the API in question.
If the application knows that tunnel offload is more important to it
(compared to MARK, for instance), it can request just TUNNEL_ID. Why
complicate this?

I don't agree with your example, the "ls"  is clearly defined and each
time you run it you get the same order. It doesn't change between versions.
While in this case there will be change between versions.

Maybe not that good example, indeed. But the fact that it's clearlydefined is true in this particular case. There are tons of programswhich don't document their defaults clearly and never cease to surprisetheir users when new versions get released... It's so customary.

Think about it this way lets assume that PMD doesn't support the TUNNEL_ID
so the application request at startup both TUNNEL_ID and MARK.
PMD returnes only MARK, application checks and see that the PMD
didn't return the TUNNEL_ID so it negotiate again only to get that nothing
is supported, then application try only the mark and to this the PMD agree.

So what's the problem? The key phrase here is that "application checks".Yes, it does check the output. And has the right to disagree, tore-negotiate.


Again this is not critical to me. But keep it in mind.


We never lost this from our view.

Frankly, we had internal discussions and of course we did realise thatletting the PMD chose the optimal subset would raise concerns. But wealso should keep in mind the fact that communicating conflicts might bedifficult. I'll refrain from ranting about possible algorithms, though.

It's a trade off between avoiding PMDs push their vision of the optimalfeature set and keeping the API simple and concise and thus user-friendly.

Also, the PMD logic is internal so if for some reason
the PMD selected the best for the application by chance, so the application

learns

that this is a good value for him. A release later the internal PMD logic

changes

for example, a new feature was added, other customer requests.
since this is PMD the original app is not aware of this change and may fail.


The same argumentation can equally apply to default RSS table, for
example. What if an application gets accustomed to the round-robin table
being default in some specific PMD (and the PMD maintainers change
default RSS table out of a sudden)? Oops!

Yes but this is why the use has the option to select the mode,
in case of RSS if the requested mode isn't supported the PMD fails not
just select different algorithm right?

I don't refer to the MQ mode or hash algorithm. I refer to defaultfill-out of RETA. The application author may test its product once withsome PMD and watch the RETA work in round-robin manner by default. Theymay then mistakenly assume that its guaranteed behaviour while it's not.Hence the existence of an API to let the application explicitly set RETAentries. And the applications are encouraged to use this API.

The same might apply to the API in question. Yes, it allows the PMD tosuggest the optimal feature subset *if* it can't enable the full /originally requested set of features simultaneously. But nobody preventsthe application from re-negotiating this. The application can narrowdown the requested set of features or check them one-by one.

And *this* effectively enables the application to have its own logic andfully control it. It can do multiple invocations of the API and join thedots itself. Conflicts between some features can be very clear to theapplication this way.

The truth is that the application shouldn't bind itself to some specific
vendor / PMD. In any case. Hence the negotiation process. It's just a
building block for some automation in the application.


We both agree that the application should check the result and renegotiate

if needed

I only suggested that the PMD will only return error and not assume he

knows best.

I believe we should give this more thought. Maybe Andrew can join this
conversation.

I fully agree lets sleep on it,
This will not be a blocker.

But some other PMDs (net/sfc, for instance) claim only a small fraction

of

bits

in Rx mark to deliver tunnel ID information. Remaining bits are still

available

for delivery of *user* mark ID. Please see an example at
https://patches.dpdk.org/project/dpdk/patch/20210929205730.775-2-
ivan.ma...@oktetlabs.ru/
. In this case, the PMD may want to return both flags in the response:
"MARK | TUNNEL_ID". This way, the application knows that both

features

are enabled and available for use.

Now. I anticipate more questions asking why wouldn't we prefer flow

API

terminology or why wouldn't we add an API for negotiating support for
metadata *actions* and not just metadata *delivery*. There's an

answer.

Always has been.

The thing is, the use of *actions* is very complicated. For example, the

PMD

may support action MARK for "transfer" flows but not for non-

"transfer"

ones. Also, simultaneous use of multiple different metadata actions

may

not

be possible. And, last but not least, if we force the application to check
support for *actions* on action-after-action basis, the order of checks

will

be

very confusing to applications.

Previously, in this thread, Thomas suggested to go for exactly this type

of

API, to check support for actions one-by-one, without any context
("transfer" / non-"transfer"). I'm afraid, this won't be OK.

+1 to keeping it as a separated API. (I agree actions limitation are very

complex metrix)


In any case I think this is good idea and I will see how we can add a
more generic approach of this API to the new API that I'm going to

present.

So no breakages with this API.


Please see more comments inline.



[Snip]

Yes, like I said above I don't see a difference between metadata
and mark, at least not from the application usage.
I assume you need this info at device start and by definition
the registration should happen before. (mbuf should be configured
before start)


Please see my thoughts about dynamic fields above.

We should make sure that we all reach an agreement.


+1 I think we can agree that there is a need for letting the PMD
know before the start that some action will be used.

And I'm sorry if I sound picky and hard, this is not my intention.
I'm also doing my best to review this as fast as I can.
My open issues and priorities:
1. the API breakage the possible solution adding support for the rest of the

PMDs / update doc

to say that if the function is not supported the application should assume

that

it can still use the mark / flag. -- blocker this must be resolved.


I see.

2. function name. my main issue is that metadata should be just like mark
maybe the solution can be adding metadata flag to this function.
the drawback is that the application needs to calls two functions to

configure

metadata. -- high priority but if you can give me good reasoning not just
we don't need to register the mark I guess I will be O.K.


Please see my thoughts above. This API negotiates metadata delivery on
the path between the NIC and the PMD. The metadata mbuf register API
does this on the path between the PMD and the application. So no
contradiction here.


See my comments above I think we have an agreed solution.

3. If PMD has internal logic in case of conflict or not.
Please think about it. -- low prio I will agree to what you decide.
but if you decide that PMD will have internal logic then this must be

documented

so the application will know not to rely on the results.


Please see my reply above. The application can rely on the outcome of
the negotiation (the last negotiated subset of features), but it should
know that if it disagrees with the suggested feature subset, it can
re-negotiate. All fair and square.


Like I said above think about it some more, I will also think in any
case this will not be a blocker.

Best,
Ori


Best,
Ori


Best,
Ori

Andrew.

Best,
Ori


--
Ivan M


--
Ivan M


--
Ivan M


--
Ivan M

Re: [dpdk-dev] [PATCH v3 1/5] ethdev: add API to negotiate delivery of Rx meta data

Reply via email to