Re: [dpdk-dev] [PATCH v3 0/5] A means to negotiate delivery of Rx meta data

Ivan Malov Fri, 01 Oct 2021 01:55:23 -0700

Hi Thomas,

On 01/10/2021 11:11, Thomas Monjalon wrote:

01/10/2021 08:47, Andrew Rybchenko:

On 9/30/21 10:30 PM, Ivan Malov wrote:

Hi Thomas,


On 30/09/2021 19:18, Thomas Monjalon wrote:

23/09/2021 13:20, Ivan Malov:

In 2019, commit [1] announced changes in DEV_RX_OFFLOAD namespace
intending to add new flags, RSS_HASH and FLOW_MARK. Since then,
only the former has been added. The problem hasn't been solved.
Applications still assume that no efforts are needed to enable
flow mark and similar meta data delivery.

The team behind net/sfc driver has to take over the efforts since
the problem has started impacting us. Riverhead, a cutting edge
Xilinx smart NIC family, has two Rx prefix types. Rx meta data
is available only from long Rx prefix. Switching between the
prefix formats can't happen in started state. Hence, we run
into the same problem which [1] was aiming to solve.


Sorry I don't understand what is Rx prefix?


A small chunk of per-packet metadata in Rx packet buffer preceding the
actual packet data. In terms of mbuf, this could be something lying
before m->data_off.


I've never seen the word "Rx prefix".
In general we talk about mbuf headroom and mbuf metadata,
the rest being the mbuf payload and mbuf tailroom.
I guess you mean mbuf metadata in the space of the struct rte_mbuf?

In this paragraph I describe the two ways how the NIC itself can providemetadata buffers of different sizes. Hence the term "Rx prefix". As youunderstand, the NIC HW is unaware of DPDK, mbufs and whatever else SWconcepts. To NIC, this is "Rx prefix", that is, a chunk of per-packetmetadata *preceding* the actual packet data. It's responsibility of thePMD to treat this the right way, care about headroom, payload andtailroom. I describe the two Rx prefix formats in NIC terminology justto provide the gist of the problem.

Rx meta data (mark, flag, tunnel ID) delivery is not an offload
on its own since the corresponding flows must be active to set
the data in the first place. Hence, adding offload flags
similar to RSS_HASH is not a good idea.


What means "active" here?


Active = inserted and functional. What this paragraph is trying to say
is that when you enable, say, RSS_HASH, that implies both computation of
the hash and the driver's ability to extract in from packets
("delivery"). But when it comes to MARK, it's just "delivery". No
"offload" here: the NIC won't set any mark in packets unless you create
a flow rule to make it do so. That's the gist of it.


OK
Yes I agree RTE_FLOW_ACTION_TYPE_MARK doesn't need any offload flag.
Same for RTE_FLOW_ACTION_TYPE_SET_META.

Patch [1/5] of this series adds a generic API to let applications
negotiate delivery of Rx meta data during initialisation period.


What is a metadata?
Do you mean RTE_FLOW_ITEM_TYPE_META and RTE_FLOW_ITEM_TYPE_MARK?
Metadata word could cover any field in the mbuf struct so it is vague.

Metadata here is *any* additional information provided by the NIC foreach received packet. For example, Rx flag, Rx mark, RSS hash, packetclassification info, you name it. I'd like to stress out that thesuggested API comes with flags each of which is crystal clear on whatconcrete kind of metadata it covers, eg. Rx mark.

This way, an application knows right from the start which parts
of Rx meta data won't be delivered. Hence, no necessity to try
inserting flows requesting such data and handle the failures.


Sorry I don't understand the problem you want to solve.
And sorry for not noticing earlier.


No worries. *Some* PMDs do not enable delivery of, say, Rx mark with the
packets by default (for performance reasons). If the application tries
to insert a flow with action MARK, the PMD may not be able to enable
delivery of Rx mark without the need to re-start Rx sub-system. And
that's fraught with traffic disruption and similar bad consequences. In
order to address it, we need to let the application express its interest
in receiving mark with packets as early as possible. This way, the PMD
can enable Rx mark delivery in advance. And, as an additional benefit,
the application can learn *from the very beginning* whether it will be
possible to use the feature or not. If this API tells the application
that no mark delivery will be enabled, then the application can just
skip many unnecessary attempts to insert wittingly unsupported flows
during runtime.


I'm puzzled, because we could have the same reasoning for any offload.

We're not discussing *offloads*. An offload is when NIC *computessomething* and *delivers* it. We are discussing precisely *delivery*.

I don't understand why we are focusing on mark only

We are not focusing on mark on purpose. It's just how our discussiongoes. I chose mark (could've chosen flag or anything else) just to showyou an example.

I would prefer we find a generic solution using the rte_flow API. > Can we make 
rte_flow_validate() working before port start?
If validating a fake rule doesn't make sense,
why not having a new function accepting a single action as parameter?

A noble idea, but if we feed the entire flow rule to the driver forvalidation, then the driver must not look specifically for actions FLAGor MARK in it (to enable or disable metadata delivery). This way, thedriver is obliged to also validate match criteria, attributes, etc. And,if something is unsupported (say, some specific item), the driver willhave to reject the rule as a whole thus leaving the application to jointhe dots itself.


Say, you ask the driver to validate the following rule:
pattern blah-blah-1 / blah-blah-2 / end action flag / end

intending to check support for FLAG delivery. Suppose, the driverdoesn't support pattern item "blah-blah-1". It will throw an error rightafter seeing this unsupported item and won't even go further to see theaction FLAG. How can application know whether its request for FLAG washeard or not?

And I'd not bind delivery of metadata to flow API. Consider thefollowing example. We have a DPDK application sitting at the *host* andwe have a *guest* with its *own* DPDK instance. The guest DPDK has askedthe NIC (by virtue of flow API) to mark all outgoing packets. Thispackets reach the *host* DPDK. Say, the host application just wants tosee the marked packets from the guest. Its own, (the host's) use of flowAPI is a don't care here. The host doesn't want to mark packets itself,it wants to see packets marked by the guest.

Thomas, if I'm not mistaken, net/mlx5 dv_xmeta_en driver option
is vendor-specific way to address the same problem.


Not exactly, it is configuring the capabilities:
   +------+-----------+-----------+-------------+-------------+
   | Mode | ``MARK``  | ``META``  | ``META`` Tx | FDB/Through |
   +======+===========+===========+=============+=============+
   | 0    | 24 bits   | 32 bits   | 32 bits     | no          |
   +------+-----------+-----------+-------------+-------------+
   | 1    | 24 bits   | vary 0-32 | 32 bits     | yes         |
   +------+-----------+-----------+-------------+-------------+
   | 2    | vary 0-24 | 32 bits   | 32 bits     | yes         |
   +------+-----------+-----------+-------------+-------------+


--
Ivan M

Re: [dpdk-dev] [PATCH v3 0/5] A means to negotiate delivery of Rx meta data

Reply via email to