Hi Gregory, I'm sorry for the review with toooo many questions without any suggestions on how to answer it. Please, see below.
On 6/25/20 7:03 PM, Gregory Etelson wrote: > From: Eli Britstein <el...@mellanox.com> > > Hardware vendors implement tunneled traffic offload techniques > differently. Although RTE flow API provides tools capable to offload > all sorts of network stacks, software application must reference this > hardware differences in flow rules compilation. As the result tunneled > traffic flow rules that utilize hardware capabilities can be different > for the same traffic. > > Tunnel port offload proposed in [1] provides software application with > unified rules model for tunneled traffic regardless underlying > hardware. > - The model introduces a concept of a virtual tunnel port (VTP). > - The model uses VTP to offload ingress tunneled network traffic > with RTE flow rules. > - The model is implemented as set of helper functions. Each PMD > implements VTP offload according to underlying hardware offload > capabilities. Applications must query PMD for VTP flow > items / actions before using in creation of a VTP flow rule. > > The model components: > - Virtual Tunnel Port (VTP) is a stateless software object that > describes tunneled network traffic. VTP object usually contains > descriptions of outer headers, tunnel headers and inner headers. > - Tunnel Steering flow Rule (TSR) detects tunneled packets and > delegates them to tunnel processing infrastructure, implemented > in PMD for optimal hardware utilization, for further processing. > - Tunnel Matching flow Rule (TMR) verifies packet configuration and > runs offload actions in case of a match. > > Application actions: > 1 Initialize VTP object according to tunnel > network parameters. > 2 Create TSR flow rule: > 2.1 Query PMD for VTP actions: application can query for VTP actions > more than once > int > rte_flow_tunnel_decap_set(uint16_t port_id, > struct rte_flow_tunnel *tunnel, > struct rte_flow_action **pmd_actions, > uint32_t *num_of_pmd_actions, > struct rte_flow_error *error); > > 2.2 Integrate PMD actions into TSR actions list. > 2.3 Create TSR flow rule: > flow create <port> group 0 > match {tunnel items} / end > actions {PMD actions} / {App actions} / end > > 3 Create TMR flow rule: > 3.1 Query PMD for VTP items: application can query for VTP items > more than once > int > rte_flow_tunnel_match(uint16_t port_id, > struct rte_flow_tunnel *tunnel, > struct rte_flow_item **pmd_items, > uint32_t *num_of_pmd_items, > struct rte_flow_error *error); > > 3.2 Integrate PMD items into TMR items list: > 3.3 Create TMR flow rule > flow create <port> group 0 > match {PMD items} / {APP items} / end > actions {offload actions} / end > > The model provides helper function call to restore packets that miss > tunnel TMR rules to its original state: > int > rte_flow_get_restore_info(uint16_t port_id, > struct rte_mbuf *mbuf, > struct rte_flow_restore_info *info, > struct rte_flow_error *error); > > rte_tunnel object filled by the call inside > rte_flow_restore_info *info parameter can be used by the application > to create new TMR rule for that tunnel. > > The model requirements: > Software application must initialize > rte_tunnel object with tunnel parameters before calling > rte_flow_tunnel_decap_set() & rte_flow_tunnel_match(). > > PMD actions array obtained in rte_flow_tunnel_decap_set() must be > released by application with rte_flow_action_release() call. > Application can release the actionsfter TSR rule was created. > > PMD items array obtained with rte_flow_tunnel_match() must be released > by application with rte_flow_item_release() call. Application can > release the items after rule was created. However, if the application > needs to create additional TMR rule for the same tunnel it will need > to obtain PMD items again. > > Application cannot destroy rte_tunnel object before it releases all > PMD actions & PMD items referencing that tunnel. > > [1] https://mails.dpdk.org/archives/dev/2020-June/169656.html > > Signed-off-by: Eli Britstein <el...@mellanox.com> > Acked-by: Ori Kam <or...@mellanox.com> > --- > doc/guides/prog_guide/rte_flow.rst | 105 ++++++++++++ > lib/librte_ethdev/rte_ethdev_version.map | 5 + > lib/librte_ethdev/rte_flow.c | 112 +++++++++++++ > lib/librte_ethdev/rte_flow.h | 196 +++++++++++++++++++++++ > lib/librte_ethdev/rte_flow_driver.h | 32 ++++ > 5 files changed, 450 insertions(+) > > diff --git a/doc/guides/prog_guide/rte_flow.rst > b/doc/guides/prog_guide/rte_flow.rst > index d5dd18ce99..cfd98c2e7d 100644 > --- a/doc/guides/prog_guide/rte_flow.rst > +++ b/doc/guides/prog_guide/rte_flow.rst > @@ -3010,6 +3010,111 @@ operations include: > - Duplication of a complete flow rule description. > - Pattern item or action name retrieval. > > +Tunneled traffic offload > +~~~~~~~~~~~~~~~~~~~~~~~~ > + > +Provide software application with unified rules model for tunneled traffic > +regardless underlying hardware. > + > + - The model introduces a concept of a virtual tunnel port (VTP). It looks like it is absolutely abstract concept now, since it is not mentioned/referenced in the header file. It makes it hard to put the description and API together. > + - The model uses VTP to offload ingress tunneled network traffic > + with RTE flow rules. > + - The model is implemented as set of helper functions. Each PMD > + implements VTP offload according to underlying hardware offload > + capabilities. Applications must query PMD for VTP flow > + items / actions before using in creation of a VTP flow rule. For me it looks like "creation of a VTP flow rule" is not covered yet. Flow rules examples mention it in pattern and actions, but there is no corresponding pattern items and actions. May be I simply misunderstand the idea. > + > +The model components: > + > +- Virtual Tunnel Port (VTP) is a stateless software object that > + describes tunneled network traffic. VTP object usually contains > + descriptions of outer headers, tunnel headers and inner headers. Are inner headers really a part of the tunnel description? > +- Tunnel Steering flow Rule (TSR) detects tunneled packets and > + delegates them to tunnel processing infrastructure, implemented > + in PMD for optimal hardware utilization, for further processing. > +- Tunnel Matching flow Rule (TMR) verifies packet configuration and > + runs offload actions in case of a match. Is it for fully offloaded tunnels with encap/decap or all tunnels (detected, but partially offloaded, e.g. checksumming)? > + > +Application actions: > + > +1 Initialize VTP object according to tunnel network parameters. I.e. fill in 'struct rte_flow_tunnel'. Is it correct? > + > +2 Create TSR flow rule. > + > +2.1 Query PMD for VTP actions. Application can query for VTP actions more > than once. > + > + .. code-block:: c > + > + int > + rte_flow_tunnel_decap_set(uint16_t port_id, > + struct rte_flow_tunnel *tunnel, > + struct rte_flow_action **pmd_actions, > + uint32_t *num_of_pmd_actions, > + struct rte_flow_error *error); > + > +2.2 Integrate PMD actions into TSR actions list. > + > +2.3 Create TSR flow rule. > + > + .. code-block:: console > + > + flow create <port> group 0 match {tunnel items} / end actions {PMD > actions} / {App actions} / end Are application actions strictly required? If no, it is better to make it clear. Do tunnel items correlate here somehow with tunnel specification in 'struct rte_flow_tunnel'? Is it obtained using rte_flow_tunnel_match()? > + > +3 Create TMR flow rule. > + > +3.1 Query PMD for VTP items. Application can query for VTP items more than > once. > + > + .. code-block:: c > + > + int > + rte_flow_tunnel_match(uint16_t port_id, > + struct rte_flow_tunnel *tunnel, > + struct rte_flow_item **pmd_items, > + uint32_t *num_of_pmd_items, > + struct rte_flow_error *error); > + > +3.2 Integrate PMD items into TMR items list. > + > +3.3 Create TMR flow rule. > + > + .. code-block:: console > + > + flow create <port> group 0 match {PMD items} / {APP items} / end > actions {offload actions} / end > + > +The model provides helper function call to restore packets that miss > +tunnel TMR rules to its original state: > + > +.. code-block:: c > + > + int > + rte_flow_get_restore_info(uint16_t port_id, > + struct rte_mbuf *mbuf, > + struct rte_flow_restore_info *info, > + struct rte_flow_error *error); > + > +rte_tunnel object filled by the call inside > +``rte_flow_restore_info *info parameter`` can be used by the application > +to create new TMR rule for that tunnel. I think an example, for example, for VXLAN over IPv4 tunnel case with some concrete parameters would be very useful here for understanding. Could it be annotated with a description of the transformations happening with a packet on different stages of the processing (including restore example). > + > +The model requirements: > + > +Software application must initialize > +rte_tunnel object with tunnel parameters before calling > +rte_flow_tunnel_decap_set() & rte_flow_tunnel_match(). > + > +PMD actions array obtained in rte_flow_tunnel_decap_set() must be > +released by application with rte_flow_action_release() call. > +Application can release the actionsfter TSR rule was created. actionsfter ? > + > +PMD items array obtained with rte_flow_tunnel_match() must be released > +by application with rte_flow_item_release() call. Application can > +release the items after rule was created. However, if the application > +needs to create additional TMR rule for the same tunnel it will need > +to obtain PMD items again. > + > +Application cannot destroy rte_tunnel object before it releases all > +PMD actions & PMD items referencing that tunnel. > + > Caveats > ------- > [snip] > diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h > index b0e4199192..1374b6e5a7 100644 > --- a/lib/librte_ethdev/rte_flow.h > +++ b/lib/librte_ethdev/rte_flow.h > @@ -3324,6 +3324,202 @@ int > rte_flow_get_aged_flows(uint16_t port_id, void **contexts, > uint32_t nb_contexts, struct rte_flow_error *error); > > +/* Tunnel information. */ > +__rte_experimental > +struct rte_flow_ip_tunnel_key { > + rte_be64_t tun_id; /**< Tunnel identification. */ What is it? Why is it big-endian? Why is it in IP tunnel key? I.e. why is it not in a generic structure? > + union { > + struct { > + rte_be32_t src_addr; /**< IPv4 source address. */ > + rte_be32_t dst_addr; /**< IPv4 destination address. */ > + } ipv4; > + struct { > + uint8_t src_addr[16]; /**< IPv6 source address. */ > + uint8_t dst_addr[16]; /**< IPv6 destination address. */ > + } ipv6; > + } u; > + bool is_ipv6; /**< True for valid IPv6 fields. Otherwise IPv4. */ > + rte_be16_t tun_flags; /**< Tunnel flags. */ Which flags? Where are these flags defined? Why is it big-endian? > + uint8_t tos; /**< TOS for IPv4, TC for IPv6. */ > + uint8_t ttl; /**< TTL for IPv4, HL for IPv6. */ If combine, I'd stick to IPv6 terminology since it is a bit better (well-thought, especially current tendencies in (re)naming in software). > + rte_be32_t label; /**< Flow Label for IPv6. */ What about IPv6 tunnels with extension headers? How to extend? > + rte_be16_t tp_src; /**< Tunnel port source. */ > + rte_be16_t tp_dst; /**< Tunnel port destination. */ What about IP-in-IP tunnels? Is it applicable? > +}; > + > + > +/* Tunnel has a type and the key information. */ > +__rte_experimental > +struct rte_flow_tunnel { > + /** > + * Tunnel type, for example RTE_FLOW_ITEM_TYPE_VXLAN, > + * RTE_FLOW_ITEM_TYPE_NVGRE etc. > + */ > + enum rte_flow_item_type type; > + struct rte_flow_ip_tunnel_key tun_info; /**< Tunnel key info. */ How to extended for non-IP tunnels? MPLS? Or tunnels with more protocols? E.g. MPLS-over-UDP? > +}; > + > +/** > + * Indicate that the packet has a tunnel. > + */ > +#define RTE_FLOW_RESTORE_INFO_TUNNEL (1ULL << 0) > + > +/** > + * Indicate that the packet has a non decapsulated tunnel header. > + */ > +#define RTE_FLOW_RESTORE_INFO_ENCAPSULATED (1ULL << 1) > + > +/** > + * Indicate that the packet has a group_id. > + */ > +#define RTE_FLOW_RESTORE_INFO_GROUP_ID (1ULL << 2) > + > +/** > + * Restore information structure to communicate the current packet processing > + * state when some of the processing pipeline is done in hardware and should > + * continue in software. > + */ > +__rte_experimental > +struct rte_flow_restore_info { > + /** > + * Bitwise flags (RTE_FLOW_RESTORE_INFO_*) to indicate validation of > + * other fields in struct rte_flow_restore_info. > + */ > + uint64_t flags; > + uint32_t group_id; /**< Group ID. */ What is the group ID here? > + struct rte_flow_tunnel tunnel; /**< Tunnel information. */ > +}; > + > +/** > + * Allocate an array of actions to be used in rte_flow_create, to implement > + * tunnel-decap-set for the given tunnel. > + * Sample usage: > + * actions vxlan_decap / tunnel-decap-set(tunnel properties) / > + * jump group 0 / end Why is jump to group used in example above? Is it mandatory? > + * > + * @param port_id > + * Port identifier of Ethernet device. > + * @param[in] tunnel > + * Tunnel properties. > + * @param[out] actions > + * Array of actions to be allocated by the PMD. This array should be > + * concatenated with the actions array provided to rte_flow_create. Please, specify concatenation order explicitly. > + * @param[out] num_of_actions > + * Number of actions allocated. > + * @param[out] error > + * Perform verbose error reporting if not NULL. PMDs initialize this > + * structure in case of error only. > + * > + * @return > + * 0 on success, a negative errno value otherwise and rte_errno is set. > + */ > +__rte_experimental > +int > +rte_flow_tunnel_decap_set(uint16_t port_id, > + struct rte_flow_tunnel *tunnel, > + struct rte_flow_action **actions, > + uint32_t *num_of_actions, Why does approach to specify actions differ here? I.e. array of specified size vs END-terminated array? Must the actions array be END-terminated here? It must be a strong reason to do it and it should be explained. > + struct rte_flow_error *error); > + > +/** > + * Allocate an array of items to be used in rte_flow_create, to implement > + * tunnel-match for the given tunnel. > + * Sample usage: > + * pattern tunnel-match(tunnel properties) / outer-header-matches / > + * inner-header-matches / end > + * > + * @param port_id > + * Port identifier of Ethernet device. > + * @param[in] tunnel > + * Tunnel properties. > + * @param[out] items > + * Array of items to be allocated by the PMD. This array should be > + * concatenated with the items array provided to rte_flow_create. Concatenation order/rules should be described. Since it is an output which entity does the concatenation. Is it allowed to refine PMD rules in application rule specification? > + * @param[out] num_of_items > + * Number of items allocated. > + * @param[out] error > + * Perform verbose error reporting if not NULL. PMDs initialize this > + * structure in case of error only. > + * > + * @return > + * 0 on success, a negative errno value otherwise and rte_errno is set. > + */ > +__rte_experimental > +int > +rte_flow_tunnel_match(uint16_t port_id, > + struct rte_flow_tunnel *tunnel, > + struct rte_flow_item **items, > + uint32_t *num_of_items, Same as above for actions. > + struct rte_flow_error *error); > + > +/** > + * Populate the current packet processing state, if exists, for the given > mbuf. > + * > + * @param port_id > + * Port identifier of Ethernet device. > + * @param[in] m > + * Mbuf struct. > + * @param[out] info > + * Restore information. Upon success contains the HW state. > + * @param[out] error > + * Perform verbose error reporting if not NULL. PMDs initialize this > + * structure in case of error only. > + * > + * @return > + * 0 on success, a negative errno value otherwise and rte_errno is set. > + */ > +__rte_experimental > +int > +rte_flow_tunnel_get_restore_info(uint16_t port_id, > + struct rte_mbuf *m, > + struct rte_flow_restore_info *info, Is it suggesting to make a copy of the restore info for each mbuf? It sounds very expensive. Could you share your thoughts about it. > + struct rte_flow_error *error); > + > +/** > + * Release the action array as allocated by rte_flow_tunnel_decap_set. > + * > + * @param port_id > + * Port identifier of Ethernet device. > + * @param[in] actions > + * Array of actions to be released. > + * @param[in] num_of_actions > + * Number of elements in actions array. > + * @param[out] error > + * Perform verbose error reporting if not NULL. PMDs initialize this > + * structure in case of error only. > + * > + * @return > + * 0 on success, a negative errno value otherwise and rte_errno is set. > + */ > +__rte_experimental > +int > +rte_flow_tunnel_action_decap_release(uint16_t port_id, > + struct rte_flow_action *actions, > + uint32_t num_of_actions, Same question as above for actions and items specification approach. > + struct rte_flow_error *error); > + > +/** > + * Release the item array as allocated by rte_flow_tunnel_match. > + * > + * @param port_id > + * Port identifier of Ethernet device. > + * @param[in] items > + * Array of items to be released. > + * @param[in] num_of_items > + * Number of elements in item array. > + * @param[out] error > + * Perform verbose error reporting if not NULL. PMDs initialize this > + * structure in case of error only. > + * > + * @return > + * 0 on success, a negative errno value otherwise and rte_errno is set. > + */ > +__rte_experimental > +int > +rte_flow_tunnel_item_release(uint16_t port_id, > + struct rte_flow_item *items, > + uint32_t num_of_items, Same question as above for actions and items specification approach. > + struct rte_flow_error *error); > #ifdef __cplusplus > } > #endif [snip] Andrew. (Right now it is hard to fully imagine how to deal with it. And it looks like a shim to vendor-specific API. May be I'm wrong. Hopefully the next version will have PMD implementation example and it will shed a bit more light on it.)