> -----Original Message----- > From: Slava Ovsiienko > Sent: Wednesday, October 30, 2019 11:00 > To: Andrew Rybchenko <arybche...@solarflare.com>; Thomas Monjalon > <tho...@monjalon.net>; olivier.m...@6wind.com > Cc: dev@dpdk.org; Matan Azrad <ma...@mellanox.com>; Ori Kam > <or...@mellanox.com>; Yongseok Koh <ys...@mellanox.com> > Subject: RE: [dpdk-dev] [PATCH v4] ethdev: extend flow metadata > > > -----Original Message----- > > From: Andrew Rybchenko <arybche...@solarflare.com> > > Sent: Wednesday, October 30, 2019 9:35 > > To: Slava Ovsiienko <viachesl...@mellanox.com>; Thomas Monjalon > > <tho...@monjalon.net>; olivier.m...@6wind.com > > Cc: dev@dpdk.org; Matan Azrad <ma...@mellanox.com>; Ori Kam > > <or...@mellanox.com>; Yongseok Koh <ys...@mellanox.com> > > Subject: Re: [dpdk-dev] [PATCH v4] ethdev: extend flow metadata > > > > @Olivier, please, take a look at the end of the mail. > > > > On 10/29/19 8:19 PM, Slava Ovsiienko wrote: > > > Hi, Andrew > > > > > > Thank you for the review. > > > > > >> -----Original Message----- > > >> From: Andrew Rybchenko <arybche...@solarflare.com> > > >> Sent: Tuesday, October 29, 2019 18:22 > > >> To: Slava Ovsiienko <viachesl...@mellanox.com>; dev@dpdk.org > > >> Cc: Thomas Monjalon <tho...@monjalon.net>; Matan Azrad > > >> <ma...@mellanox.com>; olivier.m...@6wind.com; Ori Kam > > >> <or...@mellanox.com>; Yongseok Koh <ys...@mellanox.com> > > >> Subject: Re: [dpdk-dev] [PATCH v4] ethdev: extend flow metadata > > >> > > >> On 10/27/19 9:40 PM, Viacheslav Ovsiienko wrote: > > >>> Currently, metadata can be set on egress path via mbuf tx_metadata > > >>> field with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_META > > >> matches metadata. > > >>> This patch extends the metadata feature usability. > > >>> > > >>> 1) RTE_FLOW_ACTION_TYPE_SET_META > > >>> > > >>> When supporting multiple tables, Tx metadata can also be set by a > > >>> rule and matched by another rule. This new action allows metadata > > >>> to be set as a result of flow match. > > >>> > > >>> 2) Metadata on ingress > > >>> > > >>> There's also need to support metadata on ingress. Metadata can be > > >>> set by SET_META action and matched by META item like Tx. The final > > >>> value set by the action will be delivered to application via > > >>> metadata dynamic field of mbuf which can be accessed by > > >> RTE_FLOW_DYNF_METADATA(). > > >>> PKT_RX_DYNF_METADATA flag will be set along with the data. > > >>> > > >>> The mbuf dynamic field must be registered by calling > > >>> rte_flow_dynf_metadata_register() prior to use SET_META action. > > >>> > > >>> The availability of dynamic mbuf metadata field can be checked > > >>> with > > >>> rte_flow_dynf_metadata_avail() routine. > > >>> > > >>> For loopback/hairpin packet, metadata set on Rx/Tx may or may not > > >>> be propagated to the other path depending on hardware capability. > > >>> > > >>> Signed-off-by: Yongseok Koh <ys...@mellanox.com> > > >>> Signed-off-by: Viacheslav Ovsiienko <viachesl...@mellanox.com> > > >> Above explanations lack information about "meta" vs "mark" which > > >> may be set on Rx as well and delivered in other mbuf field. > > >> It should be explained by one more field is required and rules defined. > > > There is some story about metadata features. > > > Initially, there were proposed two metadata related actions: > > > > > > - RTE_FLOW_ACTION_TYPE_FLAG > > > - RTE_FLOW_ACTION_TYPE_MARK > > > > > > These actions set the special flag in the packet metadata, MARK > > > action stores some specified value in the metadata storage, and, on > > > the packet receiving PMD puts the flag and value to the mbuf and > > > applications can see the packet was threated inside flow engine > > > according to the appropriate RTE flow(s). MARK and FLAG are like > > > some kind of gateway to transfer some per-packet information from > > > the flow > > engine to the application via receiving datapath. > > > > > > From the datapath point of view, the MARK and FLAG are related to > > > the > > receiving side only. > > > It would useful to have the same gateway on the transmitting side > > > and there was the feature of type RTE_FLOW_ITEM_TYPE_META was > > proposed. > > > The application can fill the field in mbuf and this value will be > > > transferred to > > some field in the packet metadata inside the flow engine. > > > It did not matter whether these metadata fields are shared because > > > of MARK and META items belonged to different domains (receiving and > > transmitting) and could be vendor-specific. > > > > > > So far, so good, DPDK proposes some entities to control metadata > > > inside the flow engine and gateways to exchange these values on a > > > per- > > packet basis via datapaths. > > > > > > As we can see, the MARK and META means are not symmetric, there is > > > absent action which would allow us to set META value on the > > > transmitting > > path. So, the action of type: > > > - RTE_FLOW_ACTION_TYPE_SET_META is proposed. > > > > > > The next, applications raise the new requirements for packet metadata. > > > The flow engines are getting more complex, internal switches are > > > introduced, multiple ports might be supported within the same flow > > > engine namespace. From the DPDK points of view, it means the packets > > > might be sent on one eth_dev port and received on the other one, and > > > the > > packet path inside the flow engine entirely belongs to the same > > hardware device. The simplest example is SR-IOV with PF, VFs and the > representors. > > > And there is a brilliant opportunity to provide some out-of-band > > > channel to > > transfer some extra data > > > from one port to another one, besides the packet data itself. > > > > > > > > >> Above explanations lack information about "meta" vs "mark" which > > >> may be set on Rx as well and delivered in other mbuf field. > > >> It should be explained by one more field is required and rules defined. > > >> Otherwise we can endup in half PMDs supporting mark only, half PMDs > > >> supporting meta only and applications in an interesting situation > > >> to make a choice which one to use. > > > There is no "mark" vs "meta". MARK and META means are kept for > > > compatibility issues and legacy part works exactly as before. The > > > trials (with flow_validate) is supposed to check whether PMD > > > supports MARK or META feature on appropriate domain. It depends on > > > PMD implementation, configuration and underlaying HW/FW/kernel > > > capabilities > > and should be resolved in runtime. > > > > The trials a way, but very tricky way. My imagination draws me > > pictures how an application code could look like in attempt to use > > either mark or meta for Rx only and these pictures are not nice. > Agree, trials is not the best way. > For now there is no application trying to choose mark or meta, because > these ones belonged to other domains, and extension is newly introduced. > So, the applications using mark will continue use mark, the same is regarding > meta. > The new application definitely will ask for both mark and of them, we have > the requirements from customers - "give us as many through bits as you > can". > This new application just may refuse to work if metadata features are not > detected, because relays on it strongly. > BTW, trials are not so complex: rte_flow_validate(mark), > rte_flow_validate_metadata() and that's it. > > > May be it will look acceptable when mark becomes a dynamic since usage > > of either one or another dynamic field is definitely easier than usage > > of either fixed or dynamic field. > At least in PMD datapath it does not look very ugly. > > > > May be dynamic field for mark at fixed offset should be introduced in > > the release or the nearest future? It will allow to preserve ABI up to > > 20.11 and provide future proof API. > > The trick is to register dynamic meta field at fixed offset at start > > of a day to be sure that it is guaranteed to succeed. > > It sounds like it is a transition mechanism from fixed to dynamic fields. > > > > [snip] > > > > >>> diff --git a/lib/librte_ethdev/rte_flow.h > > >>> b/lib/librte_ethdev/rte_flow.h index 4fee105..b821557 100644 > > >>> --- a/lib/librte_ethdev/rte_flow.h > > >>> +++ b/lib/librte_ethdev/rte_flow.h > > >>> @@ -28,6 +28,8 @@ > > >>> #include <rte_byteorder.h> > > >>> #include <rte_esp.h> > > >>> #include <rte_higig.h> > > >>> +#include <rte_mbuf.h> > > >>> +#include <rte_mbuf_dyn.h> > > >>> > > >>> #ifdef __cplusplus > > >>> extern "C" { > > >>> @@ -418,7 +420,8 @@ enum rte_flow_item_type { > > >>> /** > > >>> * [META] > > >>> * > > >>> - * Matches a metadata value specified in mbuf metadata field. > > >>> + * Matches a metadata value. > > >>> + * > > >>> * See struct rte_flow_item_meta. > > >>> */ > > >>> RTE_FLOW_ITEM_TYPE_META, > > >>> @@ -1263,9 +1266,17 @@ struct > rte_flow_item_icmp6_nd_opt_tla_eth > > { > > >>> #endif > > >>> > > >>> /** > > >>> - * RTE_FLOW_ITEM_TYPE_META. > > >>> + * @warning > > >>> + * @b EXPERIMENTAL: this structure may change without prior > > >>> + notice > > >> Is it allowed to make experimental back? > > > I think we should remove EXPERIMENTAL here. We do not introduce new > > > feature, but just extend the apply area. > > > > Agreed. > > > > >>> * > > >>> - * Matches a specified metadata value. > > >>> + * RTE_FLOW_ITEM_TYPE_META > > >>> + * > > >>> + * Matches a specified metadata value. On egress, metadata can be > > >>> + set either by > > >>> + * mbuf tx_metadata field with PKT_TX_METADATA flag or > > >>> + * RTE_FLOW_ACTION_TYPE_SET_META. On ingress, > > >>> + RTE_FLOW_ACTION_TYPE_SET_META sets > > >>> + * metadata for a packet and the metadata will be reported via > > >>> + mbuf metadata > > >>> + * dynamic field with PKT_RX_DYNF_METADATA flag. The dynamic > > mbuf > > >>> + field must be > > >>> + * registered in advance by rte_flow_dynf_metadata_register(). > > >>> */ > > >>> struct rte_flow_item_meta { > > >>> rte_be32_t data; > > >> [snip] > > >> > > >>> @@ -2429,6 +2447,55 @@ struct rte_flow_action_set_mac { > > >>> uint8_t mac_addr[RTE_ETHER_ADDR_LEN]; > > >>> }; > > >>> > > >>> +/** > > >>> + * @warning > > >>> + * @b EXPERIMENTAL: this structure may change without prior > > >>> +notice > > >>> + * > > >>> + * RTE_FLOW_ACTION_TYPE_SET_META > > >>> + * > > >>> + * Set metadata. Metadata set by mbuf tx_metadata field with > > >>> + * PKT_TX_METADATA flag on egress will be overridden by this action. > > >>> +On > > >>> + * ingress, the metadata will be carried by mbuf metadata dynamic > > >>> +field > > >>> + * with PKT_RX_DYNF_METADATA flag if set. The dynamic mbuf field > > >>> +must be > > >>> + * registered in advance by rte_flow_dynf_metadata_register(). > > >>> + * > > >>> + * Altering partial bits is supported with mask. For bits which > > >>> +have never > > >>> + * been set, unpredictable value will be seen depending on driver > > >>> + * implementation. For loopback/hairpin packet, metadata set on > > >>> +Rx/Tx may > > >>> + * or may not be propagated to the other path depending on HW > > >> capability. > > >>> + * > > >>> + * RTE_FLOW_ITEM_TYPE_META matches metadata. > > >>> + */ > > >>> +struct rte_flow_action_set_meta { > > >>> + rte_be32_t data; > > >>> + rte_be32_t mask; > > >> As I understand tx_metadata is host endian. Just double-checking. > > >> Is a new dynamic field host endian or big endian? > > >> I definitely would like to see motivation in comments why data/mask > > >> are big- endian here. > > > metadata is opaque value, endianness does not matter, there are no > > > some special motivations for choosing endiannes. > > > rte_flow_item_meta() structure provides data with rte_be32_t type, > > > so meta related action does > > the same. > > > > Endianness of meta in mbuf and flow API should match and it must be > Yes, and they match (for meta), both are rte_be32_t. OK, will emphasize it in > docs. > > > documented. Endianness is important if a HW supports less bits since > > it makes a hit for application to use LSB first if the bit space is > > sufficient. > > mark is defined as host-endian (uint32_t) and I think meta should be > > the same. Otherwise it complicates even more either mark or meta usage > > as discussed above . > mark is mark, meta is meta, these ones are not interrelated (despite they are > becoming similar). And there is no choice between mark and meta. > It is not obvious we should have the same endianness for both. > There are some applications using this legacy features, so there might be > compatibility issues either. > > > > > Yes, I think that rte_flow_item_meta should be fixed since both mark > > and tx_metadata are host-endian. > > > > (it says nothing about HW interface which is vendor specific and > > vendor PMDs should care about it) > Handling this in PMD might introduce the extra endianness conversion in > datapath, impacting performance. Not nice, IMO.
Update: I've reviewed other [META] items/actions, all of them are in host endianness. OK, we are moving tx_metadata to dynfield, it seems to be good point to alter metadata endianness. And I see the way to avoid conversions in data path With best regards, Slava > > > > > I could assume the origin of selecting bigendian type was the > > > endianness of metadata field in Tx descriptor of ConnectX NICs. > > > > > >>> +}; > > >>> + > > >>> +/* Mbuf dynamic field offset for metadata. */ extern int > > >>> +rte_flow_dynf_metadata_offs; > > >>> + > > >>> +/* Mbuf dynamic field flag mask for metadata. */ extern uint64_t > > >>> +rte_flow_dynf_metadata_mask; > > >> These two global variables look frightening to me. > > >> It does not look good to me. > > > For me too. But we need the performance, these ones are intended for > > > usage in datapath, any overhead is painful. > > > > @Olivier, could you share your thoughts, please. > > > > Andrew. > > With best regards, Slava