Adding discussion back to list.

On 24/12/2017 2:18 PM, Boris Pismenny wrote:
Hi Declan

On 12/22/2017 12:21 AM, Doherty, Declan wrote:
This RFC contains a proposal to add a new tunnel endpoint API to DPDK that when used in conjunction with rte_flow enables the configuration of inline data path encapsulation and decapsulation of tunnel endpoint network overlays on accelerated IO devices.

The proposed new API would provide for the creation, destruction, and
monitoring of a tunnel endpoint in supporting hw, as well as capabilities APIs to allow the
acceleration features to be discovered by applications.

/** Tunnel Endpoint context, opaque structure */
struct rte_tep;

enum rte_tep_type {
                RTE_TEP_TYPE_VXLAN = 1, /**< VXLAN Protocol */
                RTE_TEP_TYPE_NVGRE,     /**< NVGRE Protocol */
                ...
};

/** Tunnel Endpoint Attributes */
struct rte_tep_attr {
                enum rte_type_type type;

                /* other endpoint attributes here */
}

/**
* Create a tunnel end-point context as specified by the flow attribute and pattern
*
* @param   port_id     Port identifier of Ethernet device.
* @param   attr        Flow rule attributes.
* @param   pattern     Pattern specification by list of rte_flow_items.
* @return
*  - On success returns pointer to TEP context
*  - On failure returns NULL
*/
struct rte_tep *rte_tep_create(uint16_t port_id,
                               struct rte_tep_attr *attr, struct rte_flow_item pattern[])

/**
* Destroy an existing tunnel end-point context. All the end-points context
* will be destroyed, so all active flows using tep should be freed before
* destroying context.
* @param   port_id    Port identifier of Ethernet device.
* @param   tep        Tunnel endpoint context
* @return
*  - On success returns 0
*  - On failure returns 1
*/
int rte_tep_destroy(uint16_t port_id, struct rte_tep *tep)

/**
* Get tunnel endpoint statistics
*
* @param   port_id    Port identifier of Ethernet device.
* @param   tep        Tunnel endpoint context
* @param   stats      Tunnel endpoint statistics
*
* @return
*  - On success returns 0
*  - On failure returns 1
*/
Int
rte_tep_stats_get(uint16_t port_id, struct rte_tep *tep,
                               struct rte_tep_stats *stats)

/**
* Get ports tunnel endpoint capabilities
*
* @param   port_id    Port identifier of Ethernet device.
* @param   capabilities        Tunnel endpoint capabilities
*
* @return
*  - On success returns 0
*  - On failure returns 1
*/
int
rte_tep_capabilities_get(uint16_t port_id,
                               struct rte_tep_capabilities *capabilities)


To direct traffic flows to hw terminated tunnel endpoint the rte_flow API is
enhanced to add a new flow item type. This contains a pointer to the
TEP context as well as the overlay flow id to which the traffic flow is
associated.

struct rte_flow_item_tep {
                struct rte_tep *tep;
                uint32_t flow_id;
}

Also 2 new generic actions types are added encapsulation and decapsulation.

RTE_FLOW_ACTION_TYPE_ENCAP
RTE_FLOW_ACTION_TYPE_DECAP

struct rte_flow_action_encap {
                struct rte_flow_item *item;
}

struct rte_flow_action_decap {
                struct rte_flow_item *item;
}

The following section outlines the intended usage of the new APIs and then how
they are combined with the existing rte_flow APIs.

Tunnel endpoints are created on logical ports which support the capability
using rte_tep_create() using a combination of TEP attributes and
rte_flow_items. In the example below a new IPv4 VxLAN endpoint is being defined. The attrs parameter sets the TEP type, and could be used for other possible
attributes.

struct rte_tep_attr attrs = { .type = RTE_TEP_TYPE_VXLAN };

The values for the headers which make up the tunnel endpointr are then
defined using spec parameter in the rte flow items (IPv4, UDP and
VxLAN in this case)

struct rte_flow_item_ipv4 ipv4_item = {
                .hdr = { .src_addr = saddr, .dst_addr = daddr }
};

struct rte_flow_item_udp udp_item = {
                .hdr = { .src_port = sport, .dst_port = dport }
};

struct rte_flow_item_vxlan vxlan_item = { .flags = vxlan_flags };

struct rte_flow_item pattern[] = {
                { .type = RTE_FLOW_ITEM_TYPE_IPV4, .spec = &ipv4_item },
                { .type = RTE_FLOW_ITEM_TYPE_UDP, .spec = &udp_item },
                { .type = RTE_FLOW_ITEM_TYPE_VXLAN, .spec = &vxlan_item },
                { .type = RTE_FLOW_ITEM_TYPE_END }
};

The tunnel endpoint can then be create on the port. Whether or not any hw
configuration is required at this point would be hw dependent, but if not
the context for the TEP is available for use in programming flow, so the
application is not forced to redefine the TEP parameters on each flow
addition.

struct rte_tep *tep = rte_tep_create(port_id, &attrs, pattern);

Once the tep context is created flows can then be directed to that endpoint for processing. The following sections will outline how the author envisage flow programming will work and also how TEP acceleration can be combined with other
accelerations.


Ingress TEP decapsulation, mark and forward to queue:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The flows definition for TEP decapsulation actions should specify the full outer packet to be matched at a minimum. The outer packet definition should
match the tunnel definition in the tep context and the tep flow id. This
example shows describes matching on the outer, marking the packet with the
VXLAN VNI and directing to a specified queue of the port.

Source Packet

        Decapsulate Outer Hdr
      /                       \ decap outer crc      /                         \ /          \ +-----+------+-----+-------+-----+------+-----+---------+-----+-----------+      | ETH | IPv4 | UDP | VxLAN | ETH | IPv4 | TCP | PAYLOAD | CRC | OUTER CRC | +-----+------+-----+-------+-----+------+-----+---------+-----+-----------+

/* Flow Attributes/Items Definitions */

struct rte_flow_attr attr = { .ingress = 1 };

struct rte_flow_item_eth eth_item = { .src = s_addr, .dst = d_addr, .type = ether_type };
struct rte_flow_item_tep tep_item = { .tep = tep, .id = vni };

struct rte_flow_item pattern[] = {
                { .type = RTE_FLOW_ITEM_TYPE_ETH, .spec = &eth_item },
                { .type = RTE_FLOW_ITEM_TYPE_TEP, .spec = &tep_item  },
                { .type = RTE_FLOW_ITEM_TYPE_END }
};

/* Flow Actions Definitions */

struct rte_flow_action_decap decap_eth = {
                .type = RTE_FLOW_ITEM_TYPE_ETH,
                .item = { .src = s_addr, .dst = d_addr, .type = ether_type }
};

struct rte_flow_action_decap decap_tep = {
                .type = RTE_FLOW_ITEM_TYPE_TEP,
.spec = &tep_item
};

struct rte_flow_action_queue queue_action = { .index = qid };

struct rte_flow_action_port mark_action = { .index = vni };

struct rte_flow_action actions[] = {
                { .type = RTE_FLOW_ACTION_TYPE_DECAP, .conf = &decap_eth },                 { .type = RTE_FLOW_ACTION_TYPE_DECAP, .conf = &decap_tep },                 { .type = RTE_FLOW_ACTION_TYPE_MARK, .conf = &mark_action },                 { .type = RTE_FLOW_ACTION_TYPE_QUEUE, .conf = &queue_action },
                { .type = RTE_FLOW_ACTION_TYPE_END }
};

I guess the Ethernet header is kept separate so that it would be possible to update it separately?
But, I don't know of anyway to update a specific rte_flow pattern.
Maybe it would be best to combine it with the rest of the TEP and add an update TEP command?

The main reason I had for proposing the the Ethernet header as a separate entity from the the TEP was to minimize replication of fields when multiple encapsulations actions were chained together, for example if a tunnel IPsec action is chained with a TEP action, which of the actions should contain the Ethernet header information as both could possibly define it, but it would depend on which is the last encapsulation, but it may make sense to have a update function and just live with the small replication.



/** VERY IMPORTANT NOTE **/
One of the core concepts of this proposal is that actions which modify the packet are defined in the order which they are to be processed. So first decap
outer ethernet header, then the outer TEP headers.
I think this is not only logical from a usability point of view, it should also
simplify the logic required in PMDs to parse the desired actions.

This makes a lot of sense when dealing with encap/decap.
Maybe it would be best to add a new bit from the reserved field in rte_flow_attr to express this. Something like this:

struct rte_flow_attr {
         uint32_t group; /**< Priority group. */
         uint32_t priority; /**< Priority level within group. */
         uint32_t ingress:1; /**< Rule applies to ingress traffic. */
         uint32_t egress:1; /**< Rule applies to egress traffic. */
     uint32_t inorder:1; /**< Actions are applied in order. */
         uint32_t reserved:29; /**< Reserved, must be zero. */
};


That makes sense to me.


struct rte_flow *flow =
                               rte_flow_create(port_id, &attr, pattern, actions, &err);

The processed packets are delivered to specifed queue with mbuf metadata
denoting marked flow id and with mbuf ol_flags PKT_RX_TEP_OFFLOAD set.

     +-----+------+-----+---------+-----+
     | ETH | IPv4 | TCP | PAYLOAD | CRC |
     +-----+------+-----+---------+-----+


Ingress TEP decapsulation switch to port:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is intended to represent how a TEP decapsulation could be configured
in a switching offload case, it makes an assumption that there is a logical port representation for all ports on the hw switch in the DPDK application, but similar functionality could be achieved by specifying something like a
VF ID of the device.

Like the previous scenario the flows definition for TEP decapsulation actions
should specify the full outer packet to be matched at a minimum but also
define the elements of the inner match to match against including masks if
required.

Why is the inner specification necessary?

This example is for an OvS like usecase where you want to match on to a specific flow, so you are matching against both outer and inner and only decapsulating the outer for that particular flow.

 > What if I'd like to decapsulate all VXLAN traffic of some specification?

I think the previous example to this shows that case.



struct rte_flow_attr attr = { .ingress = 1 };

struct rte_flow_item pattern[] = {
                { .type = RTE_FLOW_ITEM_TYPE_ETH, .spec = &outer_eth_item },                 { .type = RTE_FLOW_ITEM_TYPE_TEP, .spec = &outer_tep_item, .mask = &tep_mask },                 { .type = RTE_FLOW_ITEM_TYPE_ETH, .spec = &inner_eth_item, .mask = &eth_mask }                 { .type = RTE_FLOW_ITEM_TYPE_IPv4, .spec = &inner_ipv4_item, .mask = &ipv4_mask },                 { .type = RTE_FLOW_ITEM_TYPE_TCP, .spec = &inner_tcp_item, .mask = &tcp_mask },
                { .type = RTE_FLOW_ITEM_TYPE_END }
};

/* Flow Actions Definitions */

struct rte_flow_action_decap decap_eth = {
                .type = RTE_FLOW_ITEM_TYPE_ETH,
                .item = { .src = s_addr, .dst = d_addr, .type = ether_type }
};

struct rte_flow_action_decap decap_tep = {
                .type = RTE_FLOW_ITEM_TYPE_TEP,
                .item = &outer_tep_item
};

struct rte_flow_action_port port_action = { .index = port_id };

struct rte_flow_action actions[] = {
                { .type = RTE_FLOW_ACTION_TYPE_DECAP, .conf = &decap_eth },                 { .type = RTE_FLOW_ACTION_TYPE_DECAP, .conf = &decap_tep },                 { .type = RTE_FLOW_ACTION_TYPE_PORT, .conf = &port_action },
                { .type = RTE_FLOW_ACTION_TYPE_END }
}; >
struct rte_flow *flow = rte_flow_create(port_id, &attr, pattern, actions, &err);

This action will forward the decapsulated packets to another port of the switch fabric but no information will on the tunnel or the fact that the packet was
decapsulated will be passed with it, thereby enable segregation of the
infrastructure and


Egress TEP encapsulation:
~~~~~~~~~~~~~~~~~~~~~~~~~

Encapulsation TEP actions require the flow definitions for the source packet
and then the actions to do on that, this example shows a ipv4/tcp packet
action.

Source Packet

     +-----+------+-----+---------+-----+
     | ETH | IPv4 | TCP | PAYLOAD | CRC |
     +-----+------+-----+---------+-----+

struct rte_flow_attr attr = { .egress = 1 };

struct rte_flow_item_eth eth_item = { .src = s_addr, .dst = d_addr, .type = ether_type }; struct rte_flow_item_ipv4 ipv4_item = { .hdr = { .src_addr = src_addr, .dst_addr = dst_addr } }; struct rte_flow_item_udp tcp_item = { .hdr = { .src_port = src_port, .dst_port = dst_port } };

struct rte_flow_item pattern[] = {
                { .type = RTE_FLOW_ITEM_TYPE_ETH, .spec = &eth_item },
                { .type = RTE_FLOW_ITEM_TYPE_IPV4, .spec = &ipv4_item },
                { .type = RTE_FLOW_ITEM_TYPE_TCP, .spec = &tcp_item },
                { .type = RTE_FLOW_ITEM_TYPE_END }
};

/* Flow Actions Definitions */

struct rte_flow_action_encap encap_eth = {
                .type = RTE_FLOW_ITEM_TYPE_ETH,
                .item = { .src = s_addr, .dst = d_addr, .type = ether_type }
};

struct rte_flow_action_encap encap_tep = {
                .type = RTE_FLOW_ITEM_TYPE_TEP,
                .item = { .tep = tep, .id = vni }
};
struct rte_flow_action_mark port_action = { .index = port_id };

This is the source port_id, where previously it was the destination port_id, right?

Apologies, there is a typo in the above action definition it should be:

struct rte_flow_action_port port_action = { .index = port_id };

So it should be read as the destination port id also.


SW       +-+ +-+ +-+ +-+ +-+
ETHDEV   |0| |1| |2| |3| |4|
         +++ +-+ +++ +++ +++
          ^       ^   ^   ^
----------|-------|---|---|-
          |       v   v   v
HW        |      +-+ +-+ +-+
          |      |A| |B| |C|  Host Ports
          |      +++ +++ +++
          |       |   |   |
          |      ++---+---++
          |      |   PPP   |  Packet Processing Pipeline
          |      +-+-----+-+    (including switching)
          |        |     |
          |       +++   +++
          +------>|D|   |E|   Physical Ports
                  +-+   +-+

So for the above example if the traffic is originating on port B of the switch from the hw perspective and after encapsulation will be transmitted on port D, the flow rule would be created as an egress rule on ethdev port_id=3 and the port_action would be to ethdev port_id=0



struct rte_flow_action actions[] = {
                { .type = RTE_FLOW_ACTION_TYPE_ENCAP, .conf = &encap_tep },                 { .type = RTE_FLOW_ACTION_TYPE_ENCAP, .conf = &encap_eth },                 { .type = RTE_FLOW_ACTION_TYPE_PORT, .conf = &port_action },
                { .type = RTE_FLOW_ACTION_TYPE_END }
}
struct rte_flow *flow = rte_flow_create(port_id, &attr, pattern, actions, &err);


       encapsulating Outer Hdr
      /                       \ outer crc      /                         \ /          \ +-----+------+-----+-------+-----+------+-----+---------+-----+-----------+      | ETH | IPv4 | UDP | VxLAN | ETH | IPv4 | TCP | PAYLOAD | CRC | OUTER CRC | +-----+------+-----+-------+-----+------+-----+---------+-----+-----------+



Chaining multiple modification actions eg IPsec and TEP
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For example the definition for full hw acceleration for an IPsec ESP/Transport
SA encapsulated in a vxlan tunnel would look something like:

struct rte_flow_action actions[] = {
                { .type = RTE_FLOW_ACTION_TYPE_ENCAP, .conf = &encap_tep },                 { .type = RTE_FLOW_ACTION_TYPE_SECURITY, .conf = &sec_session },                 { .type = RTE_FLOW_ACTION_TYPE_ENCAP, .conf = &encap_eth },
                { .type = RTE_FLOW_ACTION_TYPE_END }
}

Assuming the actions are ordered..
The order here suggests that the packet looks like:
[ETH | IP | UDP | VXLAN | ETH | IP | ESP | payload | ESP TRAILER | CRC]

But, the packet below has the ESP header as the outer header.
Also, shouldn't the encap_eth action come before the encap_tep action >

Maybe un-intuitively :) I have the actions ordered as performed on the packet, so first to last, so I think the opposite order you are reading them in.

So the first thing is to do the encapsulation of the tep, which is adding the IP|UDP|VxLAN, I don't have a Ethernet encapsulation after this as in this example we are going a security action which could possibly change the outer IP. The the security action, which is a ESP/Transport on the modification and lastly the ethernet encapsulation.

[ETH | IP | ESP | ** UDP | VXLAN | ETH | IP | TCP | PAYLOAD | CRC | ESP TRAILER ** | CRC ]

Packet encrypted between ** **


1. Source Packet
                            +-----+------+-----+---------+-----+
                            | ETH | IPv4 | TCP | PAYLOAD | CRC |
                            +-----+------+-----+---------+-----+

2. First Action - Tunnel Endpoint Encapsulation

       +------+-----+-------+-----+------+-----+---------+-----+
       | IPv4 | UDP | VxLAN | ETH | IPv4 | TCP | PAYLOAD | CRC |
       +------+-----+-------+-----+------+-----+---------+-----+

3. Second Action - IPsec ESP/Transport Security Processing

+------+-----+-----+-------+-----+------+-----+---------+-----+-------------+
 IPv4 | ESP |              ENCRYPTED PAYLOAD                 |  ESP TRAILER |
+------+-----+-----+-------+-----+------+-----+---------+-----+-------------+

4. Third Action - Outer Ethernet Encapsulation

+-----+------+-----+-----+-------+-----+------+-----+---------+-----+-------------+-----------+ | ETH | IPv4 | ESP |              ENCRYPTED PAYLOAD                 | ESP TRAILER | OUTER CRC | +-----+------+-----+-----+-------+-----+------+-----+---------+-----+-------------+-----------+

This example demonstrates the importance of making the interoperation of
actions to be ordered, as in the above example, a security
action can be defined on both the inner and outer packet by simply placing
another security action at the beginning of the action list.

It also demonstrates the rationale for not collapsing the Ethernet into
the TEP definition as when you have multiple encapsulating actions, all
could potentially be the place where the Ethernet header needs to be
defined.



With rte_security full protocol offload as presented here we still need someway to provide and update the Ethernet header. Maybe there should be two encap_eth actions in this case. One for the outer and another for the inner?

Yes, it's the same issue that you raised above. I think possibly it makes sense that both rte_security and rte_tep support update functions and allow definition of a Ethernet header, and we have a mechanism of defining whether the Ethernet header is required for the rte_tep/rte_security action.

Reply via email to