Re: [dpdk-dev] [PATCH 2/2] ethdev: tunnel offload model

Andrew Rybchenko Sun, 05 Jul 2020 07:52:12 -0700

Hi Gregory,

I'm sorry for the review with toooo many questions without any
suggestions on how to answer it. Please, see below.


On 6/25/20 7:03 PM, Gregory Etelson wrote:
> From: Eli Britstein <el...@mellanox.com>
> 
> Hardware vendors implement tunneled traffic offload techniques
> differently. Although RTE flow API provides tools capable to offload
> all sorts of network stacks, software application must reference this
> hardware differences in flow rules compilation. As the result tunneled
> traffic flow rules that utilize hardware capabilities can be different
> for the same traffic.
> 
> Tunnel port offload proposed in [1] provides software application with
> unified rules model for tunneled traffic regardless underlying
> hardware.
>  - The model introduces a concept of a virtual tunnel port (VTP).
>  - The model uses VTP to offload ingress tunneled network traffic 
>    with RTE flow rules.
>  - The model is implemented as set of helper functions. Each PMD
>    implements VTP offload according to underlying hardware offload
>    capabilities.  Applications must query PMD for VTP flow
>    items / actions before using in creation of a VTP flow rule.
> 
> The model components:
> - Virtual Tunnel Port (VTP) is a stateless software object that
>   describes tunneled network traffic.  VTP object usually contains
>   descriptions of outer headers, tunnel headers and inner headers.
> - Tunnel Steering flow Rule (TSR) detects tunneled packets and
>   delegates them to tunnel processing infrastructure, implemented
>   in PMD for optimal hardware utilization, for further processing.
> - Tunnel Matching flow Rule (TMR) verifies packet configuration and
>   runs offload actions in case of a match.
> 
> Application actions:
> 1 Initialize VTP object according to tunnel
>   network parameters.
> 2 Create TSR flow rule:
> 2.1 Query PMD for VTP actions: application can query for VTP actions
>     more than once
>     int
>     rte_flow_tunnel_decap_set(uint16_t port_id,
>                               struct rte_flow_tunnel *tunnel,
>                               struct rte_flow_action **pmd_actions,
>                               uint32_t *num_of_pmd_actions,
>                               struct rte_flow_error *error);
> 
> 2.2 Integrate PMD actions into TSR actions list.
> 2.3 Create TSR flow rule:
>     flow create <port> group 0
>           match {tunnel items} / end
>           actions {PMD actions} / {App actions} / end
> 
> 3 Create TMR flow rule:
> 3.1 Query PMD for VTP items: application can query for VTP items
>     more than once
>     int
>     rte_flow_tunnel_match(uint16_t port_id,
>                           struct rte_flow_tunnel *tunnel,
>                           struct rte_flow_item **pmd_items,
>                           uint32_t *num_of_pmd_items,
>                           struct rte_flow_error *error);
> 
> 3.2 Integrate PMD items into TMR items list:
> 3.3 Create TMR flow rule
>     flow create <port> group 0
>           match {PMD items} / {APP items} / end
>           actions {offload actions} / end
> 
> The model provides helper function call to restore packets that miss
> tunnel TMR rules to its original state:
> int
> rte_flow_get_restore_info(uint16_t port_id,
>                           struct rte_mbuf *mbuf,
>                           struct rte_flow_restore_info *info,
>                           struct rte_flow_error *error);
> 
> rte_tunnel object filled by the call inside
> rte_flow_restore_info *info parameter can be used by the application
> to create new TMR rule for that tunnel.
> 
> The model requirements:
> Software application must initialize
> rte_tunnel object with tunnel parameters before calling
> rte_flow_tunnel_decap_set() & rte_flow_tunnel_match().
> 
> PMD actions array obtained in rte_flow_tunnel_decap_set() must be
> released by application with rte_flow_action_release() call.
> Application can release the actionsfter TSR rule was created.
> 
> PMD items array obtained with rte_flow_tunnel_match() must be released
> by application with rte_flow_item_release() call.  Application can
> release the items after rule was created. However, if the application
> needs to create additional TMR rule for the same tunnel it will need
> to obtain PMD items again.
> 
> Application cannot destroy rte_tunnel object before it releases all
> PMD actions & PMD items referencing that tunnel.
> 
> [1] https://mails.dpdk.org/archives/dev/2020-June/169656.html
> 
> Signed-off-by: Eli Britstein <el...@mellanox.com>
> Acked-by: Ori Kam <or...@mellanox.com>
> ---
>  doc/guides/prog_guide/rte_flow.rst       | 105 ++++++++++++
>  lib/librte_ethdev/rte_ethdev_version.map |   5 +
>  lib/librte_ethdev/rte_flow.c             | 112 +++++++++++++
>  lib/librte_ethdev/rte_flow.h             | 196 +++++++++++++++++++++++
>  lib/librte_ethdev/rte_flow_driver.h      |  32 ++++
>  5 files changed, 450 insertions(+)
> 
> diff --git a/doc/guides/prog_guide/rte_flow.rst 
> b/doc/guides/prog_guide/rte_flow.rst
> index d5dd18ce99..cfd98c2e7d 100644
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -3010,6 +3010,111 @@ operations include:
>  - Duplication of a complete flow rule description.
>  - Pattern item or action name retrieval.
>  
> +Tunneled traffic offload
> +~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Provide software application with unified rules model for tunneled traffic
> +regardless underlying hardware.
> +
> + - The model introduces a concept of a virtual tunnel port (VTP).

It looks like it is absolutely abstract concept now, since it
is not mentioned/referenced in the header file. It makes it
hard to put the description and API together.

> + - The model uses VTP to offload ingress tunneled network traffic 
> +   with RTE flow rules.
> + - The model is implemented as set of helper functions. Each PMD
> +   implements VTP offload according to underlying hardware offload
> +   capabilities.  Applications must query PMD for VTP flow
> +   items / actions before using in creation of a VTP flow rule.

For me it looks like "creation of a VTP flow rule" is not
covered yet. Flow rules examples mention it in pattern and
actions, but there is no corresponding pattern items and
actions. May be I simply misunderstand the idea.

> +
> +The model components:
> +
> +- Virtual Tunnel Port (VTP) is a stateless software object that
> +  describes tunneled network traffic.  VTP object usually contains
> +  descriptions of outer headers, tunnel headers and inner headers.

Are inner headers really a part of the tunnel description?

> +- Tunnel Steering flow Rule (TSR) detects tunneled packets and
> +  delegates them to tunnel processing infrastructure, implemented
> +  in PMD for optimal hardware utilization, for further processing.
> +- Tunnel Matching flow Rule (TMR) verifies packet configuration and
> +  runs offload actions in case of a match.

Is it for fully offloaded tunnels with encap/decap or all
tunnels (detected, but partially offloaded, e.g. checksumming)?

> +
> +Application actions:
> +
> +1 Initialize VTP object according to tunnel network parameters.

I.e. fill in 'struct rte_flow_tunnel'. Is it correct?

> +
> +2 Create TSR flow rule.
> +
> +2.1 Query PMD for VTP actions. Application can query for VTP actions more 
> than once.
> +
> +  .. code-block:: c
> +
> +    int
> +    rte_flow_tunnel_decap_set(uint16_t port_id,
> +                              struct rte_flow_tunnel *tunnel,
> +                              struct rte_flow_action **pmd_actions,
> +                              uint32_t *num_of_pmd_actions,
> +                              struct rte_flow_error *error);
> +
> +2.2 Integrate PMD actions into TSR actions list.
> +
> +2.3 Create TSR flow rule.
> +
> +    .. code-block:: console
> +
> +      flow create <port> group 0 match {tunnel items} / end actions {PMD 
> actions} / {App actions} / end

Are application actions strictly required?
If no, it is better to make it clear.
Do tunnel items correlate here somehow with tunnel
specification in 'struct rte_flow_tunnel'?
Is it obtained using rte_flow_tunnel_match()?

> +
> +3 Create TMR flow rule.
> +
> +3.1 Query PMD for VTP items. Application can query for VTP items more than 
> once.
> +
> +    .. code-block:: c
> +
> +      int
> +      rte_flow_tunnel_match(uint16_t port_id,
> +                            struct rte_flow_tunnel *tunnel,
> +                            struct rte_flow_item **pmd_items,
> +                            uint32_t *num_of_pmd_items,
> +                            struct rte_flow_error *error);
> +
> +3.2 Integrate PMD items into TMR items list.
> +
> +3.3 Create TMR flow rule.
> +
> +    .. code-block:: console
> +
> +      flow create <port> group 0 match {PMD items} / {APP items} / end 
> actions {offload actions} / end
> +
> +The model provides helper function call to restore packets that miss
> +tunnel TMR rules to its original state:
> +
> +.. code-block:: c
> +
> +  int
> +  rte_flow_get_restore_info(uint16_t port_id,
> +                            struct rte_mbuf *mbuf,
> +                            struct rte_flow_restore_info *info,
> +                            struct rte_flow_error *error);
> +
> +rte_tunnel object filled by the call inside
> +``rte_flow_restore_info *info parameter`` can be used by the application
> +to create new TMR rule for that tunnel.

I think an example, for example, for VXLAN over IPv4 tunnel
case with some concrete parameters would be very useful here
for understanding. Could it be annotated with a description
of the transformations happening with a packet on different
stages of the processing (including restore example).

> +
> +The model requirements:
> +
> +Software application must initialize
> +rte_tunnel object with tunnel parameters before calling
> +rte_flow_tunnel_decap_set() & rte_flow_tunnel_match().
> +
> +PMD actions array obtained in rte_flow_tunnel_decap_set() must be
> +released by application with rte_flow_action_release() call.
> +Application can release the actionsfter TSR rule was created.

actionsfter ?

> +
> +PMD items array obtained with rte_flow_tunnel_match() must be released
> +by application with rte_flow_item_release() call.  Application can
> +release the items after rule was created. However, if the application
> +needs to create additional TMR rule for the same tunnel it will need
> +to obtain PMD items again.
> +
> +Application cannot destroy rte_tunnel object before it releases all
> +PMD actions & PMD items referencing that tunnel.
> +
>  Caveats
>  -------
>  

[snip]

> diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
> index b0e4199192..1374b6e5a7 100644
> --- a/lib/librte_ethdev/rte_flow.h
> +++ b/lib/librte_ethdev/rte_flow.h
> @@ -3324,6 +3324,202 @@ int
>  rte_flow_get_aged_flows(uint16_t port_id, void **contexts,
>                       uint32_t nb_contexts, struct rte_flow_error *error);
>  
> +/* Tunnel information. */
> +__rte_experimental
> +struct rte_flow_ip_tunnel_key {
> +     rte_be64_t tun_id; /**< Tunnel identification. */

What is it? Why is it big-endian? Why is it in IP tunnel key?
I.e. why is it not in a generic structure?

> +     union {
> +             struct {
> +                     rte_be32_t src_addr; /**< IPv4 source address. */
> +                     rte_be32_t dst_addr; /**< IPv4 destination address. */
> +             } ipv4;
> +             struct {
> +                     uint8_t src_addr[16]; /**< IPv6 source address. */
> +                     uint8_t dst_addr[16]; /**< IPv6 destination address. */
> +             } ipv6;
> +     } u;
> +     bool       is_ipv6; /**< True for valid IPv6 fields. Otherwise IPv4. */
> +     rte_be16_t tun_flags; /**< Tunnel flags. */

Which flags? Where are these flags defined?
Why is it big-endian?

> +     uint8_t    tos; /**< TOS for IPv4, TC for IPv6. */
> +     uint8_t    ttl; /**< TTL for IPv4, HL for IPv6. */

If combine, I'd stick to IPv6 terminology since it is a bit
better (well-thought, especially current tendencies in
(re)naming in software).

> +     rte_be32_t label; /**< Flow Label for IPv6. */

What about IPv6 tunnels with extension headers? How to extend?

> +     rte_be16_t tp_src; /**< Tunnel port source. */
> +     rte_be16_t tp_dst; /**< Tunnel port destination. */

What about IP-in-IP tunnels? Is it applicable?

> +};
> +
> +
> +/* Tunnel has a type and the key information. */
> +__rte_experimental
> +struct rte_flow_tunnel {
> +     /**
> +      * Tunnel type, for example RTE_FLOW_ITEM_TYPE_VXLAN,
> +      * RTE_FLOW_ITEM_TYPE_NVGRE etc.
> +      */
> +     enum rte_flow_item_type         type;
> +     struct rte_flow_ip_tunnel_key   tun_info; /**< Tunnel key info. */

How to extended for non-IP tunnels? MPLS?
Or tunnels with more protocols? E.g. MPLS-over-UDP?

> +};
> +
> +/**
> + * Indicate that the packet has a tunnel.
> + */
> +#define RTE_FLOW_RESTORE_INFO_TUNNEL  (1ULL << 0)
> +
> +/**
> + * Indicate that the packet has a non decapsulated tunnel header.
> + */
> +#define RTE_FLOW_RESTORE_INFO_ENCAPSULATED  (1ULL << 1)
> +
> +/**
> + * Indicate that the packet has a group_id.
> + */
> +#define RTE_FLOW_RESTORE_INFO_GROUP_ID  (1ULL << 2)
> +
> +/**
> + * Restore information structure to communicate the current packet processing
> + * state when some of the processing pipeline is done in hardware and should
> + * continue in software.
> + */
> +__rte_experimental
> +struct rte_flow_restore_info {
> +     /**
> +      * Bitwise flags (RTE_FLOW_RESTORE_INFO_*) to indicate validation of
> +      * other fields in struct rte_flow_restore_info.
> +      */
> +     uint64_t flags;
> +     uint32_t group_id; /**< Group ID. */

What is the group ID here?

> +     struct rte_flow_tunnel tunnel; /**< Tunnel information. */
> +};
> +
> +/**
> + * Allocate an array of actions to be used in rte_flow_create, to implement
> + * tunnel-decap-set for the given tunnel.
> + * Sample usage:
> + *   actions vxlan_decap / tunnel-decap-set(tunnel properties) /
> + *            jump group 0 / end

Why is jump to group used in example above? Is it mandatory?

> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] tunnel
> + *   Tunnel properties.
> + * @param[out] actions
> + *   Array of actions to be allocated by the PMD. This array should be
> + *   concatenated with the actions array provided to rte_flow_create.

Please, specify concatenation order explicitly.

> + * @param[out] num_of_actions
> + *   Number of actions allocated.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_decap_set(uint16_t port_id,
> +                       struct rte_flow_tunnel *tunnel,
> +                       struct rte_flow_action **actions,
> +                       uint32_t *num_of_actions,

Why does approach to specify actions differ here?
I.e. array of specified size vs END-terminated array?
Must the actions array be END-terminated here?
It must be a strong reason to do it and it should be
explained.

> +                       struct rte_flow_error *error);
> +
> +/**
> + * Allocate an array of items to be used in rte_flow_create, to implement
> + * tunnel-match for the given tunnel.
> + * Sample usage:
> + *   pattern tunnel-match(tunnel properties) / outer-header-matches /
> + *           inner-header-matches / end
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] tunnel
> + *   Tunnel properties.
> + * @param[out] items
> + *   Array of items to be allocated by the PMD. This array should be
> + *   concatenated with the items array provided to rte_flow_create.

Concatenation order/rules should be described.
Since it is an output which entity does the concatenation.
Is it allowed to refine PMD rules in application
rule specification?

> + * @param[out] num_of_items
> + *   Number of items allocated.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_match(uint16_t port_id,
> +                   struct rte_flow_tunnel *tunnel,
> +                   struct rte_flow_item **items,
> +                   uint32_t *num_of_items,

Same as above for actions.

> +                   struct rte_flow_error *error);
> +
> +/**
> + * Populate the current packet processing state, if exists, for the given 
> mbuf.
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] m
> + *   Mbuf struct.
> + * @param[out] info
> + *   Restore information. Upon success contains the HW state.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_get_restore_info(uint16_t port_id,
> +                              struct rte_mbuf *m,
> +                              struct rte_flow_restore_info *info,

Is it suggesting to make a copy of the restore info for each
mbuf? It sounds very expensive. Could you share your thoughts
about it.

> +                              struct rte_flow_error *error);
> +
> +/**
> + * Release the action array as allocated by rte_flow_tunnel_decap_set.
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] actions
> + *   Array of actions to be released.
> + * @param[in] num_of_actions
> + *   Number of elements in actions array.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_action_decap_release(uint16_t port_id,
> +                                  struct rte_flow_action *actions,
> +                                  uint32_t num_of_actions,

Same question as above for actions and items specification
approach.

> +                                  struct rte_flow_error *error);
> +
> +/**
> + * Release the item array as allocated by rte_flow_tunnel_match.
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] items
> + *   Array of items to be released.
> + * @param[in] num_of_items
> + *   Number of elements in item array.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_item_release(uint16_t port_id,
> +                          struct rte_flow_item *items,
> +                          uint32_t num_of_items,

Same question as above for actions and items specification
approach.

> +                          struct rte_flow_error *error);
>  #ifdef __cplusplus
>  }
>  #endif

[snip]

Andrew.

(Right now it is hard to fully imagine how to deal with it.
And it looks like a shim to vendor-specific API. May be I'm
wrong. Hopefully the next version will have PMD implementation
example and it will shed a bit more light on it.)

Re: [dpdk-dev] [PATCH 2/2] ethdev: tunnel offload model

Reply via email to