Hi Adrien Mazarguil, In your v2 version of rte_flow.txt , there is an action type RTE_FLOW_ACTION_TYPE_MARK, but there is no definition of struct rte_flow_action_mark. And there is an definition of struct rte_flow_action_id. Is it a typo or other usage?
Thank you. struct rte_flow_action_id { uint32_t id; /**< 32 bit value to return with packets. */ }; > -----Original Message----- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Adrien Mazarguil > Sent: Saturday, August 20, 2016 3:33 AM > To: dev at dpdk.org > Subject: [dpdk-dev] [RFC v2] Generic flow director/filtering/classification > API > > Hi All, > > Thanks to many for the positive and constructive feedback I've received so > far. Here is the updated specification (v0.7) at last. > > I've attempted to address as many comments as possible but could not > process them all just yet. A new section "Future evolutions" has been > added for the remaining topics. > > This series adds rte_flow.h to the DPDK tree. Next time I will attempt to > convert the specification as a documentation commit part of the patchset > and actually implement API functions. > > I think including the entire document here makes it easier to annotate on > the ML, apologies in advance for the resulting traffic. > > Finally I'm off for the next two weeks, do not expect replies from me in > the meantime. > > Updates are also available online: > > HTML version: > https://rawgit.com/6WIND/rte_flow/master/rte_flow.html > > PDF version: > https://rawgit.com/6WIND/rte_flow/master/rte_flow.pdf > > Related draft header file (also in the next patch): > https://raw.githubusercontent.com/6WIND/rte_flow/master/rte_flow.h > > Git tree: > https://github.com/6WIND/rte_flow > > Changes from v1: > > Specification: > > - Settled on [generic] "flow interface" / "flow API" as the name of this > framework, matches the rte_flow prefix better. > - Minor wording changes in several places. > - Partially added egress (TX) support. > - Added "unrecoverable errors" as another consequence of overlapping > rules. > - Described flow rules groups and their interaction with flow rule > priorities. > - Fully described PF and VF meta pattern items so they are not open to > interpretation anymore. > - Removed the SIGNATURE meta pattern item as its description was too > vague, may be re-added later if necessary. > - Added the PORT pattern item to apply rules to non-default physical > ports. > - Entirely redefined the RAW pattern item. > - Fixed tag error in the ETH item definition. > - Updated protocol definitions (IPV4, IPV6, ICMP, UDP). > - Added missing protocols (SCTP, VXLAN). > - Converted ID action to MARK and FLAG actions, described interaction > with the RSS hash result in mbufs. > - Updated COUNT query structure to retrieve the number of bytes. > - Updated VF action. > - Documented negative item and action types, those will be used for > dynamic types generated at run-time. > - Added blurb about IPv4 options and IPv6 extension headers matching. > - Updated function definitions. > - Documented a flush method to remove all rules on a given port at once. > - Documented the verbose error reporting interface. > - Documented how the private interface for PMD use will work. > - Documented expected behavior between successive port initializations. > - Documented expected behavior for ports not under DPDK control. > - Updated API migration section. > - Added future evolutions section. > > Header file: > > - Not a draft anymore and can be used as-is for preliminary > implementations. > - Flow rule attributes (group, priority, etc) now have their own > structure provided separately to API functions (struct rte_flow_attr). > - Group and priority interactions have been documented. > - Added PORT item. > - Removed SIGNATURE item. > - Defined ICMP, SCTP and VXLAN items. > - Redefined PF, VF, RAW, IPV4, IPV6, UDP and TCP items. > - Fixed tag error in the ETH item definition. > - Converted ID action to MARK and FLAG actions. > hash result in mbufs. > - Updated COUNT query structure. > - Updated VF action. > - Added verbose errors interface. > - Updated function prototypes according to the above. > - Defined rte_flow_flush(). > > -------- > > ====================== > Generic flow interface > ====================== > > .. footer:: > > v0.7 > > .. contents:: > .. sectnum:: > .. raw:: pdf > > PageBreak > > Overview > ======== > > DPDK provides several competing interfaces added over time to perform > packet > matching and related actions such as filtering and classification. > > They must be extended to implement the features supported by newer > devices > in order to expose them to applications, however the current design has > several drawbacks: > > - Complicated filter combinations which have not been hard-coded cannot be > expressed. > - Prone to API/ABI breakage when new features must be added to an > existing > filter type, which frequently happens. > > From an application point of view: > > - Having disparate interfaces, all optional and lacking in features does not > make this API easy to use. > - Seemingly arbitrary built-in limitations of filter types based on the > device they were initially designed for. > - Undefined relationship between different filter types. > - High complexity, considerable undocumented and/or undefined behavior. > > Considering the growing number of devices supported by DPDK, adding a > new > filter type each time a new feature must be implemented is not sustainable > in the long term. Applications not written to target a specific device > cannot really benefit from such an API. > > For these reasons, this document defines an extensible unified API that > encompasses and supersedes these legacy filter types. > > .. raw:: pdf > > PageBreak > > Current API > =========== > > Rationale > --------- > > The reason several competing (and mostly overlapping) filtering APIs are > present in DPDK is due to its nature as a thin layer between hardware and > software. > > Each subsequent interface has been added to better match the capabilities > and limitations of the latest supported device, which usually happened to > need an incompatible configuration approach. Because of this, many ended > up > device-centric and not usable by applications that were not written for that > particular device. > > This document is not the first attempt to address this proliferation issue, > in fact a lot of work has already been done both to create a more generic > interface while somewhat keeping compatibility with legacy ones through a > common call interface (``rte_eth_dev_filter_ctrl()`` with the > ``.filter_ctrl`` PMD callback in ``rte_ethdev.h``). > > Today, these previously incompatible interfaces are known as filter types > (``RTE_ETH_FILTER_*`` from ``enum rte_filter_type`` in ``rte_eth_ctrl.h``). > > However while trivial to extend with new types, it only shifted the > underlying problem as applications still need to be written for one kind of > filter type, which, as described in the following sections, is not > necessarily implemented by all PMDs that support filtering. > > .. raw:: pdf > > PageBreak > > Filter types > ------------ > > This section summarizes the capabilities of each filter type. > > Although the following list is exhaustive, the description of individual > types may contain inaccuracies due to the lack of documentation or usage > examples. > > Note: names are prefixed with ``RTE_ETH_FILTER_``. > > ``MACVLAN`` > ~~~~~~~~~~~ > > Matching: > > - L2 source/destination addresses. > - Optional 802.1Q VLAN ID. > - Masking individual fields on a rule basis is not supported. > > Action: > > - Packets are redirected either to a given VF device using its ID or to the > PF. > > ``ETHERTYPE`` > ~~~~~~~~~~~~~ > > Matching: > > - L2 source/destination addresses (optional). > - Ethertype (no VLAN ID?). > - Masking individual fields on a rule basis is not supported. > > Action: > > - Receive packets on a given queue. > - Drop packets. > > ``FLEXIBLE`` > ~~~~~~~~~~~~ > > Matching: > > - At most 128 consecutive bytes anywhere in packets. > - Masking is supported with byte granularity. > - Priorities are supported (relative to this filter type, undefined > otherwise). > > Action: > > - Receive packets on a given queue. > > ``SYN`` > ~~~~~~~ > > Matching: > > - TCP SYN packets only. > - One high priority bit can be set to give the highest possible priority to > this type when other filters with different types are configured. > > Action: > > - Receive packets on a given queue. > > ``NTUPLE`` > ~~~~~~~~~~ > > Matching: > > - Source/destination IPv4 addresses (optional in 2-tuple mode). > - Source/destination TCP/UDP port (mandatory in 2 and 5-tuple modes). > - L4 protocol (2 and 5-tuple modes). > - Masking individual fields is supported. > - TCP flags. > - Up to 7 levels of priority relative to this filter type, undefined > otherwise. > - No IPv6. > > Action: > > - Receive packets on a given queue. > > ``TUNNEL`` > ~~~~~~~~~~ > > Matching: > > - Outer L2 source/destination addresses. > - Inner L2 source/destination addresses. > - Inner VLAN ID. > - IPv4/IPv6 source (destination?) address. > - Tunnel type to match (VXLAN, GENEVE, TEREDO, NVGRE, IP over GRE, > 802.1BR > E-Tag). > - Tenant ID for tunneling protocols that have one. > - Any combination of the above can be specified. > - Masking individual fields on a rule basis is not supported. > > Action: > > - Receive packets on a given queue. > > .. raw:: pdf > > PageBreak > > ``FDIR`` > ~~~~~~~~ > > Queries: > > - Device capabilities and limitations. > - Device statistics about configured filters (resource usage, collisions). > - Device configuration (matching input set and masks) > > Matching: > > - Device mode of operation: none (to disable filtering), signature > (hash-based dispatching from masked fields) or perfect (either MAC VLAN > or > tunnel). > - L2 Ethertype. > - Outer L2 destination address (MAC VLAN mode). > - Inner L2 destination address, tunnel type (NVGRE, VXLAN) and tunnel ID > (tunnel mode). > - IPv4 source/destination addresses, ToS, TTL and protocol fields. > - IPv6 source/destination addresses, TC, protocol and hop limits fields. > - UDP source/destination IPv4/IPv6 and ports. > - TCP source/destination IPv4/IPv6 and ports. > - SCTP source/destination IPv4/IPv6, ports and verification tag field. > - Note, only one protocol type at once (either only L2 Ethertype, basic > IPv6, IPv4+UDP, IPv4+TCP and so on). > - VLAN TCI (extended API). > - At most 16 bytes to match in payload (extended API). A global device > look-up table specifies for each possible protocol layer (unknown, raw, > L2, L3, L4) the offset to use for each byte (they do not need to be > contiguous) and the related bit-mask. > - Whether packet is addressed to PF or VF, in that case its ID can be > matched as well (extended API). > - Masking most of the above fields is supported, but simultaneously affects > all filters configured on a device. > - Input set can be modified in a similar fashion for a given device to > ignore individual fields of filters (i.e. do not match the destination > address in a IPv4 filter, refer to **RTE_ETH_INPUT_SET_** > macros). Configuring this also affects RSS processing on **i40e**. > - Filters can also provide 32 bits of arbitrary data to return as part of > matched packets. > > Action: > > - **RTE_ETH_FDIR_ACCEPT**: receive (accept) packet on a given queue. > - **RTE_ETH_FDIR_REJECT**: drop packet immediately. > - **RTE_ETH_FDIR_PASSTHRU**: similar to accept for the last filter in list, > otherwise process it with subsequent filters. > - For accepted packets and if requested by filter, either 32 bits of > arbitrary data and four bytes of matched payload (only in case of flex > bytes matching), or eight bytes of matched payload (flex also) are added > to meta data. > > .. raw:: pdf > > PageBreak > > ``HASH`` > ~~~~~~~~ > > Not an actual filter type. Provides and retrieves the global device > configuration (per port or entire NIC) for hash functions and their > properties. > > Hash function selection: "default" (keep current), XOR or Toeplitz. > > This function can be configured per flow type (**RTE_ETH_FLOW_** > definitions), supported types are: > > - Unknown. > - Raw. > - Fragmented or non-fragmented IPv4. > - Non-fragmented IPv4 with L4 (TCP, UDP, SCTP or other). > - Fragmented or non-fragmented IPv6. > - Non-fragmented IPv6 with L4 (TCP, UDP, SCTP or other). > - L2 payload. > - IPv6 with extensions. > - IPv6 with L4 (TCP, UDP) and extensions. > > ``L2_TUNNEL`` > ~~~~~~~~~~~~~ > > Matching: > > - All packets received on a given port. > > Action: > > - Add tunnel encapsulation (VXLAN, GENEVE, TEREDO, NVGRE, IP over GRE, > 802.1BR E-Tag) using the provided Ethertype and tunnel ID (only E-Tag > is implemented at the moment). > - VF ID to use for tag insertion (currently unused). > - Destination pool for tag based forwarding (pools are IDs that can be > affected to ports, duplication occurs if the same ID is shared by several > ports of the same NIC). > > .. raw:: pdf > > PageBreak > > Driver support > -------------- > > ======== ======= ========= ======== === ====== ====== ==== ==== > ========= > Driver MACVLAN ETHERTYPE FLEXIBLE SYN NTUPLE TUNNEL FDIR HASH > L2_TUNNEL > ======== ======= ========= ======== === ====== ====== ==== ==== > ========= > bnx2x > cxgbe > e1000 yes yes yes yes > ena > enic yes > fm10k > i40e yes yes yes yes yes > ixgbe yes yes yes yes yes > mlx4 > mlx5 yes > szedata2 > ======== ======= ========= ======== === ====== ====== ==== ==== > ========= > > Flow director > ------------- > > Flow director (FDIR) is the name of the most capable filter type, which > covers most features offered by others. As such, it is the most widespread > in PMDs that support filtering (i.e. all of them besides **e1000**). > > It is also the only type that allows an arbitrary 32 bits value provided by > applications to be attached to a filter and returned with matching packets > instead of relying on the destination queue to recognize flows. > > Unfortunately, even FDIR requires applications to be aware of low-level > capabilities and limitations (most of which come directly from **ixgbe** and > **i40e**): > > - Bit-masks are set globally per device (port?), not per filter. > - Configuration state is not expected to be saved by the driver, and > stopping/restarting a port requires the application to perform it again > (API documentation is also unclear about this). > - Monolithic approach with ABI issues as soon as a new kind of flow or > combination needs to be supported. > - Cryptic global statistics/counters. > - Unclear about how priorities are managed; filters seem to be arranged as a > linked list in hardware (possibly related to configuration order). > > Packet alteration > ----------------- > > One interesting feature is that the L2 tunnel filter type implements the > ability to alter incoming packets through a filter (in this case to > encapsulate them), thus the **mlx5** flow encap/decap features are not a > foreign concept. > > .. raw:: pdf > > PageBreak > > Proposed API > ============ > > Terminology > ----------- > > - **Flow API**: overall framework affecting the fate of selected packets, > covers everything described in this document. > - **Filtering API**: an alias for *Flow API*. > - **Matching pattern**: properties to look for in packets, a combination of > any number of items. > - **Pattern item**: part of a pattern that either matches packet data > (protocol header, payload or derived information), or specifies properties > of the pattern itself. > - **Actions**: what needs to be done when a packet is matched by a > pattern. > - **Flow rule**: this is the result of combining a *matching pattern* with > *actions*. > - **Filter rule**: a less generic term than *flow rule*, can otherwise be > used interchangeably. > - **Hit**: a flow rule is said to be *hit* when processing a matching > packet. > > Requirements > ------------ > > As described in the previous section, there is a growing need for a common > method to configure filtering and related actions in a hardware independent > fashion. > > The flow API should not disallow any filter combination by design and must > remain as simple as possible to use. It can simply be defined as a method to > perform one or several actions on selected packets. > > PMDs are aware of the capabilities of the device they manage and should be > responsible for preventing unsupported or conflicting combinations. > > This approach is fundamentally different as it places most of the burden on > the software side of the PMD instead of having device capabilities directly > mapped to API functions, then expecting applications to work around > ensuing > compatibility issues. > > Requirements for a new API: > > - Flexible and extensible without causing API/ABI problems for existing > applications. > - Should be unambiguous and easy to use. > - Support existing filtering features and actions listed in `Filter types`_. > - Support packet alteration. > - In case of overlapping filters, their priority should be well documented. > - Support filter queries (for example to retrieve counters). > - Support egress (TX) matching and specific actions. > > .. raw:: pdf > > PageBreak > > High level design > ----------------- > > The chosen approach to make filtering as generic as possible is by > expressing matching patterns through lists of items instead of the flat > structures used in DPDK today, enabling combinations that are not > predefined > and thus being more versatile. > > Flow rules can have several distinct actions (such as counting, > encapsulating, decapsulating before redirecting packets to a particular > queue, etc.), instead of relying on several rules to achieve this and having > applications deal with hardware implementation details regarding their > order. > > Support for different priority levels on a rule basis is provided, for > example in order to force a more specific rule come before a more generic > one for packets matched by both, however hardware support for more than > a > single priority level cannot be guaranteed. When supported, the number of > available priority levels is usually low, which is why they can also be > implemented in software by PMDs (e.g. missing priority levels may be > emulated by reordering rules). > > In order to remain as hardware agnostic as possible, by default all rules > are considered to have the same priority, which means that the order > between > overlapping rules (when a packet is matched by several filters) is > undefined, packet duplication or unrecoverable errors may even occur as a > result. > > PMDs may refuse to create overlapping rules at a given priority level when > they can be detected (e.g. if a pattern matches an existing filter). > > Thus predictable results for a given priority level can only be achieved > with non-overlapping rules, using perfect matching on all protocol layers. > > Flow rules can also be grouped, the flow rule priority is specific to the > group they belong to. All flow rules in a given group are thus processed > either before or after another group. > > Support for multiple actions per rule may be implemented internally on top > of non-default hardware priorities, as a result both features may not be > simultaneously available to applications. > > Considering that allowed pattern/actions combinations cannot be known in > advance and would result in an unpractically large number of capabilities to > expose, a method is provided to validate a given rule from the current > device configuration state without actually adding it (akin to a "dry run" > mode). > > This enables applications to check if the rule types they need is supported > at initialization time, before starting their data path. This method can be > used anytime, its only requirement being that the resources needed by a > rule > must exist (e.g. a target RX queue must be configured first). > > Each defined rule is associated with an opaque handle managed by the PMD, > applications are responsible for keeping it. These can be used for queries > and rules management, such as retrieving counters or other data and > destroying them. > > To avoid resource leaks on the PMD side, handles must be explicitly > destroyed by the application before releasing associated resources such as > queues and ports. > > Integration > ----------- > > To avoid ABI breakage, this new interface will be implemented through the > existing filtering control framework (``rte_eth_dev_filter_ctrl()``) using > **RTE_ETH_FILTER_GENERIC** as a new filter type. > > However a public front-end API described in `Rules management`_ will > be added as the preferred method to use it. > > Once discussions with the community have converged to a definite API, > legacy > filter types should be deprecated and a deadline defined to remove their > support entirely. > > PMDs will have to be gradually converted to **RTE_ETH_FILTER_GENERIC** > or > drop filtering support entirely. Less maintained PMDs for older hardware > may > lose support at this point. > > The notion of filter type will then be deprecated and subsequently dropped > to avoid confusion between both frameworks. > > Implementation details > ====================== > > Flow rule > --------- > > A flow rule is the combination a matching pattern with a list of actions, > and is the basis of this API. > > They also have several other attributes described in the following sections. > > Groups > ~~~~~~ > > Flow rules can be grouped by assigning them a common group number. > Lower > values have higher priority. Group 0 has the highest priority. > > Although optional, applications are encouraged to group similar rules as > much as possible to fully take advantage of hardware capabilities > (e.g. optimized matching) and work around limitations (e.g. a single pattern > type possibly allowed in a given group). > > Note that support for more than a single group is not guaranteed. > > Priorities > ~~~~~~~~~~ > > A priority level can be assigned to a flow rule. Like groups, lower values > denote higher priority, with 0 as the maximum. > > A rule with priority 0 in group 8 is always matched after a rule with > priority 8 in group 0. > > Group and priority levels are arbitrary and up to the application, they do > not need to be contiguous nor start from 0, however the maximum number > varies between devices and may be affected by existing flow rules. > > If a packet is matched by several rules of a given group for a given > priority level, the outcome is undefined. It can take any path, may be > duplicated or even cause unrecoverable errors. > > Note that support for more than a single priority level is not guaranteed. > > Traffic direction > ~~~~~~~~~~~~~~~~~ > > Flow rules can apply to inbound and/or outbound traffic (ingress/egress). > > Several pattern items and actions are valid and can be used in both > directions. Those valid for only one direction are described as such. > > Specifying both directions at once is not recommended but may be valid in > some cases, such as incrementing the same counter twice. > > Not specifying any direction is currently an error. > > .. raw:: pdf > > PageBreak > > Matching pattern > ~~~~~~~~~~~~~~~~ > > A matching pattern comprises any number of items of various types. > > Items are arranged in a list to form a matching pattern for packets. They > fall in two categories: > > - Protocol matching (ANY, RAW, ETH, IPV4, IPV6, ICMP, UDP, TCP, SCTP, > VXLAN > and so on), usually associated with a specification structure. These must > be stacked in the same order as the protocol layers to match, starting > from L2. > > - Affecting how the pattern is processed (END, VOID, INVERT, PF, VF, PORT > and so on), often without a specification structure. Since they are meta > data that does not match packet contents, these can be specified anywhere > within item lists without affecting the protocol matching items. > > Most item specifications can be optionally paired with a mask to narrow the > specific fields or bits to be matched. > > - Items are defined with ``struct rte_flow_item``. > - Patterns are defined with ``struct rte_flow_pattern``. > > Example of an item specification matching an Ethernet header: > > +-----------------------------------------+ > | Ethernet | > +==========+=========+====================+ > | ``spec`` | ``src`` | ``00:01:02:03:04`` | > | +---------+--------------------+ > | | ``dst`` | ``00:2a:66:00:01`` | > +----------+---------+--------------------+ > | ``mask`` | ``src`` | ``00:ff:ff:ff:00`` | > | +---------+--------------------+ > | | ``dst`` | ``00:00:00:00:ff`` | > +----------+---------+--------------------+ > > Non-masked bits stand for any value, Ethernet headers with the following > properties are thus matched: > > - ``src``: ``??:01:02:03:??`` > - ``dst``: ``??:??:??:??:01`` > > Except for meta types that do not need one, ``spec`` must be a valid pointer > to a structure of the related item type. A ``mask`` of the same type can be > provided to tell which bits in ``spec`` are to be matched. > > A mask is normally only needed for ``spec`` fields matching packet data, > ignored otherwise. See individual item types for more information. > > A ``NULL`` mask pointer is allowed and is similar to matching with a full > mask (all ones) ``spec`` fields supported by hardware, the remaining fields > are ignored (all zeroes), there is thus no error checking for unsupported > fields. > > .. raw:: pdf > > PageBreak > > Matching pattern items for packet data must be naturally stacked (ordered > from lowest to highest protocol layer), as in the following examples: > > +--------------+ > | TCPv4 as L4 | > +===+==========+ > | 0 | Ethernet | > +---+----------+ > | 1 | IPv4 | > +---+----------+ > | 2 | TCP | > +---+----------+ > > +----------------+ > | TCPv6 in VXLAN | > +===+============+ > | 0 | Ethernet | > +---+------------+ > | 1 | IPv4 | > +---+------------+ > | 2 | UDP | > +---+------------+ > | 3 | VXLAN | > +---+------------+ > | 4 | Ethernet | > +---+------------+ > | 5 | IPv6 | > +---+------------+ > | 6 | TCP | > +---+------------+ > > +-----------------------------+ > | TCPv4 as L4 with meta items | > +===+=========================+ > | 0 | VOID | > +---+-------------------------+ > | 1 | Ethernet | > +---+-------------------------+ > | 2 | VOID | > +---+-------------------------+ > | 3 | IPv4 | > +---+-------------------------+ > | 4 | TCP | > +---+-------------------------+ > | 5 | VOID | > +---+-------------------------+ > | 6 | VOID | > +---+-------------------------+ > > The above example shows how meta items do not affect packet data > matching > items, as long as those remain stacked properly. The resulting matching > pattern is identical to "TCPv4 as L4". > > +----------------+ > | UDPv6 anywhere | > +===+============+ > | 0 | IPv6 | > +---+------------+ > | 1 | UDP | > +---+------------+ > > If supported by the PMD, omitting one or several protocol layers at the > bottom of the stack as in the above example (missing an Ethernet > specification) enables hardware to look anywhere in packets. > > This is an alias for specifying `ANY`_ with ``min = 0`` and ``max = 0`` > properties as the first item. > > It is unspecified whether the payload of supported encapsulations > (e.g. VXLAN inner packet) is matched by such a pattern, which may apply to > inner, outer or both packets. > > +---------------------+ > | Invalid, missing L3 | > +===+=================+ > | 0 | Ethernet | > +---+-----------------+ > | 1 | UDP | > +---+-----------------+ > > The above pattern is invalid due to a missing L3 specification between L2 > and L4. It is only allowed at the bottom and at the top of the stack. > > Meta item types > ~~~~~~~~~~~~~~~ > > These do not match packet data but affect how the pattern is processed, > most > of them do not need a specification structure. This particularity allows > them to be specified anywhere without affecting other item types. > > ``END`` > ^^^^^^^ > > End marker for item lists. Prevents further processing of items, thereby > ending the pattern. > > - Its numeric value is **0** for convenience. > - PMD support is mandatory. > - Both ``spec`` and ``mask`` are ignored. > > +--------------------+ > | END | > +==========+=========+ > | ``spec`` | ignored | > +----------+---------+ > | ``mask`` | ignored | > +----------+---------+ > > ``VOID`` > ^^^^^^^^ > > Used as a placeholder for convenience. It is ignored and simply discarded by > PMDs. > > - PMD support is mandatory. > - Both ``spec`` and ``mask`` are ignored. > > +--------------------+ > | VOID | > +==========+=========+ > | ``spec`` | ignored | > +----------+---------+ > | ``mask`` | ignored | > +----------+---------+ > > One usage example for this type is generating rules that share a common > prefix quickly without reallocating memory, only by updating item types: > > +------------------------+ > | TCP, UDP or ICMP as L4 | > +===+====================+ > | 0 | Ethernet | > +---+--------------------+ > | 1 | IPv4 | > +---+------+------+------+ > | 2 | UDP | VOID | VOID | > +---+------+------+------+ > | 3 | VOID | TCP | VOID | > +---+------+------+------+ > | 4 | VOID | VOID | ICMP | > +---+------+------+------+ > > .. raw:: pdf > > PageBreak > > ``INVERT`` > ^^^^^^^^^^ > > Inverted matching, i.e. process packets that do not match the pattern. > > - Both ``spec`` and ``mask`` are ignored. > > +--------------------+ > | INVERT | > +==========+=========+ > | ``spec`` | ignored | > +----------+---------+ > | ``mask`` | ignored | > +----------+---------+ > > Usage example in order to match non-TCPv4 packets only: > > +--------------------+ > | Anything but TCPv4 | > +===+================+ > | 0 | INVERT | > +---+----------------+ > | 1 | Ethernet | > +---+----------------+ > | 2 | IPv4 | > +---+----------------+ > | 3 | TCP | > +---+----------------+ > > ``PF`` > ^^^^^^ > > Matches packets addressed to the physical function of the device. > > If the underlying device function differs from the one that would normally > receive the matched traffic, specifying this item prevents it from reaching > that device unless the flow rule contains a `PF (action)`_. Packets are not > duplicated between device instances by default. > > - Likely to return an error or never match any traffic if applied to a VF > device. > - Can be combined with any number of `VF`_ items to match both PF and VF > traffic. > - Both ``spec`` and ``mask`` are ignored. > > +--------------------+ > | PF | > +==========+=========+ > | ``spec`` | ignored | > +----------+---------+ > | ``mask`` | ignored | > +----------+---------+ > > ``VF`` > ^^^^^^ > > Matches packets addressed to a virtual function ID of the device. > > If the underlying device function differs from the one that would normally > receive the matched traffic, specifying this item prevents it from reaching > that device unless the flow rule contains a `VF (action)`_. Packets are not > duplicated between device instances by default. > > - Likely to return an error or never match any traffic if this causes a VF > device to match traffic addressed to a different VF. > - Can be specified multiple times to match traffic addressed to several VFs. > - Can be combined with a `PF`_ item to match both PF and VF traffic. > - Only ``spec`` needs to be defined, ``mask`` is ignored. > > +-------------------------------------------------+ > | VF | > +==========+=========+============================+ > | ``spec`` | ``any`` | ignore the specified VF ID | > | +---------+----------------------------+ > | | ``vf`` | destination VF ID | > +----------+---------+----------------------------+ > | ``mask`` | ignored | > +----------+--------------------------------------+ > > ``PORT`` > ^^^^^^^^ > > Matches packets coming from the specified physical port of the underlying > device. > > The first PORT item overrides the physical port normally associated with the > specified DPDK input port (port_id). This item can be provided several times > to match additional physical ports. > > Note that physical ports are not necessarily tied to DPDK input ports > (port_id) when those are not under DPDK control. Possible values are > specific to each device, they are not necessarily indexed from zero and may > not be contiguous. > > As a device property, the list of allowed values as well as the value > associated with a port_id should be retrieved by other means. > > - Only ``spec`` needs to be defined, ``mask`` is ignored. > > +--------------------------------------------+ > | PORT | > +==========+===========+=====================+ > | ``spec`` | ``index`` | physical port index | > +----------+-----------+---------------------+ > | ``mask`` | ignored | > +----------+---------------------------------+ > > .. raw:: pdf > > PageBreak > > Data matching item types > ~~~~~~~~~~~~~~~~~~~~~~~~ > > Most of these are basically protocol header definitions with associated > bit-masks. They must be specified (stacked) from lowest to highest protocol > layer. > > The following list is not exhaustive as new protocols will be added in the > future. > > ``ANY`` > ^^^^^^^ > > Matches any protocol in place of the current layer, a single ANY may also > stand for several protocol layers. > > This is usually specified as the first pattern item when looking for a > protocol anywhere in a packet. > > - A maximum value of **0** requests matching any number of protocol > layers > above or equal to the minimum value, a maximum value lower than the > minimum one is otherwise invalid. > - Only ``spec`` needs to be defined, ``mask`` is ignored. > > +-----------------------------------------------------------------------+ > | ANY | > +==========+=========+==================================== > ==============+ > | ``spec`` | ``min`` | minimum number of layers covered | > | +---------+--------------------------------------------------+ > | | ``max`` | maximum number of layers covered, 0 for infinity | > +----------+---------+--------------------------------------------------+ > | ``mask`` | ignored | > +----------+------------------------------------------------------------+ > > Example for VXLAN TCP payload matching regardless of outer L3 (IPv4 or IPv6) > and L4 (UDP) both matched by the first ANY specification, and inner L3 (IPv4 > or IPv6) matched by the second ANY specification: > > +----------------------------------+ > | TCP in VXLAN with wildcards | > +===+==============================+ > | 0 | Ethernet | > +---+-----+----------+---------+---+ > | 1 | ANY | ``spec`` | ``min`` | 2 | > | | | +---------+---+ > | | | | ``max`` | 2 | > +---+-----+----------+---------+---+ > | 2 | VXLAN | > +---+------------------------------+ > | 3 | Ethernet | > +---+-----+----------+---------+---+ > | 4 | ANY | ``spec`` | ``min`` | 1 | > | | | +---------+---+ > | | | | ``max`` | 1 | > +---+-----+----------+---------+---+ > | 5 | TCP | > +---+------------------------------+ > > .. raw:: pdf > > PageBreak > > ``RAW`` > ^^^^^^^ > > Matches a byte string of a given length at a given offset. > > Offset is either absolute (using the start of the packet) or relative to the > end of the previous matched item in the stack, in which case negative values > are allowed. > > If search is enabled, offset is used as the starting point. The search area > can be delimited by setting limit to a nonzero value, which is the maximum > number of bytes after offset where the pattern may start. > > Matching a zero-length pattern is allowed, doing so resets the relative > offset for subsequent items. > > - ``mask`` only affects the pattern field. > > +---------------------------------------------------------------------------+ > | RAW | > +==========+==============+=============================== > ==================+ > | ``spec`` | ``relative`` | look for pattern after the previous item | > | +--------------+-------------------------------------------------+ > | | ``search`` | search pattern from offset (see also ``limit``) | > | +--------------+-------------------------------------------------+ > | | ``reserved`` | reserved, must be set to zero | > | +--------------+-------------------------------------------------+ > | | ``offset`` | absolute or relative offset for ``pattern`` | > | +--------------+-------------------------------------------------+ > | | ``limit`` | search area limit for start of ``pattern`` | > | +--------------+-------------------------------------------------+ > | | ``length`` | ``pattern`` length | > | +--------------+-------------------------------------------------+ > | | ``pattern`` | byte string to look for | > +----------+--------------+-------------------------------------------------+ > | ``mask`` | ``relative`` | ignored | > | +--------------+-------------------------------------------------+ > | | ``search`` | ignored | > | +--------------+-------------------------------------------------+ > | | ``reserved`` | ignored | > | +--------------+-------------------------------------------------+ > | | ``offset`` | ignored | > | +--------------+-------------------------------------------------+ > | | ``limit`` | ignored | > | +--------------+-------------------------------------------------+ > | | ``length`` | ignored | > | +--------------+-------------------------------------------------+ > | | ``pattern`` | bit-mask of the same byte length as ``pattern`` | > +----------+--------------+-------------------------------------------------+ > > Example pattern looking for several strings at various offsets of a UDP > payload, using combined RAW items: > > .. raw:: pdf > > PageBreak > > +-------------------------------------------+ > | UDP payload matching | > +===+=======================================+ > | 0 | Ethernet | > +---+---------------------------------------+ > | 1 | IPv4 | > +---+---------------------------------------+ > | 2 | UDP | > +---+-----+----------+--------------+-------+ > | 3 | RAW | ``spec`` | ``relative`` | 1 | > | | | +--------------+-------+ > | | | | ``search`` | 1 | > | | | +--------------+-------+ > | | | | ``offset`` | 10 | > | | | +--------------+-------+ > | | | | ``limit`` | 0 | > | | | +--------------+-------+ > | | | | ``length`` | 3 | > | | | +--------------+-------+ > | | | | ``pattern`` | "foo" | > +---+-----+----------+--------------+-------+ > | 4 | RAW | ``spec`` | ``relative`` | 1 | > | | | +--------------+-------+ > | | | | ``search`` | 0 | > | | | +--------------+-------+ > | | | | ``offset`` | 20 | > | | | +--------------+-------+ > | | | | ``limit`` | 0 | > | | | +--------------+-------+ > | | | | ``length`` | 3 | > | | | +--------------+-------+ > | | | | ``pattern`` | "bar" | > +---+-----+----------+--------------+-------+ > | 5 | RAW | ``spec`` | ``relative`` | 1 | > | | | +--------------+-------+ > | | | | ``search`` | 0 | > | | | +--------------+-------+ > | | | | ``offset`` | -29 | > | | | +--------------+-------+ > | | | | ``limit`` | 0 | > | | | +--------------+-------+ > | | | | ``length`` | 3 | > | | | +--------------+-------+ > | | | | ``pattern`` | "baz" | > +---+-----+----------+--------------+-------+ > > This translates to: > > - Locate "foo" at least 10 bytes deep inside UDP payload. > - Locate "bar" after "foo" plus 20 bytes. > - Locate "baz" after "bar" minus 29 bytes. > > Such a packet may be represented as follows (not to scale):: > > 0 >= 10 B == 20 B > | |<--------->| |<--------->| > | | | | | > |-----|------|-----|-----|-----|-----|-----------|-----|------| > | ETH | IPv4 | UDP | ... | baz | foo | ......... | bar | .... | > |-----|------|-----|-----|-----|-----|-----------|-----|------| > | | > |<--------------------------->| > == 29 B > > Note that matching subsequent pattern items would resume after "baz", not > "bar" since matching is always performed after the previous item of the > stack. > > .. raw:: pdf > > PageBreak > > ``ETH`` > ^^^^^^^ > > Matches an Ethernet header. > > - ``dst``: destination MAC. > - ``src``: source MAC. > - ``type``: EtherType. > - ``tags``: number of 802.1Q/ad tags defined. > - ``tag[]``: 802.1Q/ad tag definitions, outermost first. For each one: > > - ``tpid``: Tag protocol identifier. > - ``tci``: Tag control information. > > ``IPV4`` > ^^^^^^^^ > > Matches an IPv4 header. > > Note: IPv4 options are handled by dedicated pattern items. > > - ``hdr``: IPv4 header definition (``rte_ip.h``). > > ``IPV6`` > ^^^^^^^^ > > Matches an IPv6 header. > > Note: IPv6 options are handled by dedicated pattern items. > > - ``hdr``: IPv6 header definition (``rte_ip.h``). > > ``ICMP`` > ^^^^^^^^ > > Matches an ICMP header. > > - ``hdr``: ICMP header definition (``rte_icmp.h``). > > ``UDP`` > ^^^^^^^ > > Matches a UDP header. > > - ``hdr``: UDP header definition (``rte_udp.h``). > > ``TCP`` > ^^^^^^^ > > Matches a TCP header. > > - ``hdr``: TCP header definition (``rte_tcp.h``). > > ``SCTP`` > ^^^^^^^^ > > Matches a SCTP header. > > - ``hdr``: SCTP header definition (``rte_sctp.h``). > > ``VXLAN`` > ^^^^^^^^^ > > Matches a VXLAN header (RFC 7348). > > - ``flags``: normally 0x08 (I flag). > - ``rsvd0``: reserved, normally 0x000000. > - ``vni``: VXLAN network identifier. > - ``rsvd1``: reserved, normally 0x00. > > .. raw:: pdf > > PageBreak > > Actions > ~~~~~~~ > > Each possible action is represented by a type. Some have associated > configuration structures. Several actions combined in a list can be affected > to a flow rule. That list is not ordered. > > At least one action must be defined in a filter rule in order to do > something with matched packets. > > - Actions are defined with ``struct rte_flow_action``. > - A list of actions is defined with ``struct rte_flow_actions``. > > They fall in three categories: > > - Terminating actions (such as QUEUE, DROP, RSS, PF, VF) that prevent > processing matched packets by subsequent flow rules, unless overridden > with PASSTHRU. > > - Non terminating actions (PASSTHRU, DUP) that leave matched packets up > for > additional processing by subsequent flow rules. > > - Other non terminating meta actions that do not affect the fate of packets > (END, VOID, MARK, FLAG, COUNT). > > When several actions are combined in a flow rule, they should all have > different types (e.g. dropping a packet twice is not possible). The defined > behavior is for PMDs to only take into account the last action of a given > type found in the list. PMDs still perform error checking on the entire > list. > > *Note that PASSTHRU is the only action having the ability to override a > terminating rule.* > > .. raw:: pdf > > PageBreak > > Example of an action that redirects packets to queue index 10: > > +----------------+ > | QUEUE | > +===========+====+ > | ``queue`` | 10 | > +-----------+----+ > > Action lists examples, their order is not significant, applications must > consider all actions to be performed simultaneously: > > +----------------+ > | Count and drop | > +=======+========+ > | COUNT | | > +-------+--------+ > | DROP | | > +-------+--------+ > > +--------------------------+ > | Tag, count and redirect | > +=======+===========+======+ > | MARK | ``mark`` | 0x2a | > +-------+-----------+------+ > | COUNT | | > +-------+-----------+------+ > | QUEUE | ``queue`` | 10 | > +-------+-----------+------+ > > +-----------------------+ > | Redirect to queue 5 | > +=======+===============+ > | DROP | | > +-------+-----------+---+ > | QUEUE | ``queue`` | 5 | > +-------+-----------+---+ > > In the above example, considering both actions are performed > simultaneously, > its end result is that only QUEUE has any effect. > > +-----------------------+ > | Redirect to queue 3 | > +=======+===========+===+ > | QUEUE | ``queue`` | 5 | > +-------+-----------+---+ > | VOID | | > +-------+-----------+---+ > | QUEUE | ``queue`` | 3 | > +-------+-----------+---+ > > As previously described, only the last action of a given type found in the > list is taken into account. The above example also shows that VOID is > ignored. > > .. raw:: pdf > > PageBreak > > Action types > ~~~~~~~~~~~~ > > Common action types are described in this section. Like pattern item types, > this list is not exhaustive as new actions will be added in the future. > > ``END`` (action) > ^^^^^^^^^^^^^^^^ > > End marker for action lists. Prevents further processing of actions, thereby > ending the list. > > - Its numeric value is **0** for convenience. > - PMD support is mandatory. > - No configurable property. > > +---------------+ > | END | > +===============+ > | no properties | > +---------------+ > > ``VOID`` (action) > ^^^^^^^^^^^^^^^^^ > > Used as a placeholder for convenience. It is ignored and simply discarded by > PMDs. > > - PMD support is mandatory. > - No configurable property. > > +---------------+ > | VOID | > +===============+ > | no properties | > +---------------+ > > ``PASSTHRU`` > ^^^^^^^^^^^^ > > Leaves packets up for additional processing by subsequent flow rules. This > is the default when a rule does not contain a terminating action, but can be > specified to force a rule to become non-terminating. > > - No configurable property. > > +---------------+ > | PASSTHRU | > +===============+ > | no properties | > +---------------+ > > Example to copy a packet to a queue and continue processing by subsequent > flow rules: > > +--------------------------+ > | Copy to queue 8 | > +==========+===============+ > | PASSTHRU | | > +----------+-----------+---+ > | QUEUE | ``queue`` | 8 | > +----------+-----------+---+ > > .. raw:: pdf > > PageBreak > > ``MARK`` > ^^^^^^^^ > > Attaches a 32 bit value to packets. > > This value is arbitrary and application-defined. For compatibility with FDIR > it is returned in the ``hash.fdir.hi`` mbuf field. ``PKT_RX_FDIR_ID`` is > also set in ``ol_flags``. > > +------------------------------------------------+ > | MARK | > +==========+=====================================+ > | ``mark`` | 32 bit value to return with packets | > +----------+-------------------------------------+ > > ``FLAG`` > ^^^^^^^^ > > Flag packets. Similar to `MARK`_ but only affects ``ol_flags``. > > Note: a distinctive flag must be defined for it. > > +---------------+ > | FLAG | > +===============+ > | no properties | > +---------------+ > > ``QUEUE`` > ^^^^^^^^^ > > Assigns packets to a given queue index. > > - Terminating by default. > > +--------------------------------+ > | QUEUE | > +===========+====================+ > | ``queue`` | queue index to use | > +-----------+--------------------+ > > ``DROP`` > ^^^^^^^^ > > Drop packets. > > - No configurable property. > - Terminating by default. > - PASSTHRU overrides this action if both are specified. > > +---------------+ > | DROP | > +===============+ > | no properties | > +---------------+ > > .. raw:: pdf > > PageBreak > > ``COUNT`` > ^^^^^^^^^ > > Enables counters for this rule. > > These counters can be retrieved and reset through ``rte_flow_query()``, see > ``struct rte_flow_query_count``. > > - Counters can be retrieved with ``rte_flow_query()``. > - No configurable property. > > +---------------+ > | COUNT | > +===============+ > | no properties | > +---------------+ > > Query structure to retrieve and reset flow rule counters: > > +---------------------------------------------------------+ > | COUNT query | > +===============+=====+=================================== > + > | ``reset`` | in | reset counter after query | > +---------------+-----+-----------------------------------+ > | ``hits_set`` | out | ``hits`` field is set | > +---------------+-----+-----------------------------------+ > | ``bytes_set`` | out | ``bytes`` field is set | > +---------------+-----+-----------------------------------+ > | ``hits`` | out | number of hits for this rule | > +---------------+-----+-----------------------------------+ > | ``bytes`` | out | number of bytes through this rule | > +---------------+-----+-----------------------------------+ > > ``DUP`` > ^^^^^^^ > > Duplicates packets to a given queue index. > > This is normally combined with QUEUE, however when used alone, it is > actually similar to QUEUE + PASSTHRU. > > - Non-terminating by default. > > +------------------------------------------------+ > | DUP | > +===========+====================================+ > | ``queue`` | queue index to duplicate packet to | > +-----------+------------------------------------+ > > ``RSS`` > ^^^^^^^ > > Similar to QUEUE, except RSS is additionally performed on packets to spread > them among several queues according to the provided parameters. > > Note: RSS hash result is normally stored in the ``hash.rss`` mbuf field, > however it conflicts with the `MARK`_ action as they share the same > space. When both actions are specified, the RSS hash is discarded and > ``PKT_RX_RSS_HASH`` is not set in ``ol_flags``. MARK has priority. The mbuf > structure should eventually evolve to store both. > > - Terminating by default. > > +---------------------------------------------+ > | RSS | > +==============+==============================+ > | ``rss_conf`` | RSS parameters | > +--------------+------------------------------+ > | ``queues`` | number of entries in queue[] | > +--------------+------------------------------+ > | ``queue[]`` | queue indices to use | > +--------------+------------------------------+ > > .. raw:: pdf > > PageBreak > > ``PF`` (action) > ^^^^^^^^^^^^^^^ > > Redirects packets to the physical function (PF) of the current device. > > - No configurable property. > - Terminating by default. > > +---------------+ > | PF | > +===============+ > | no properties | > +---------------+ > > ``VF`` (action) > ^^^^^^^^^^^^^^^ > > Redirects packets to a virtual function (VF) of the current device. > > Packets matched by a VF pattern item can be redirected to their original VF > ID instead of the specified one. This parameter may not be available and is > not guaranteed to work properly if the VF part is matched by a prior flow > rule or if packets are not addressed to a VF in the first place. > > - Terminating by default. > > +-----------------------------------------------+ > | VF | > +==============+================================+ > | ``original`` | use original VF ID if possible | > +--------------+--------------------------------+ > | ``vf`` | VF ID to redirect packets to | > +--------------+--------------------------------+ > > Negative types > ~~~~~~~~~~~~~~ > > All specified pattern items (``enum rte_flow_item_type``) and actions > (``enum rte_flow_action_type``) use positive identifiers. > > The negative space is reserved for dynamic types generated by PMDs during > run-time, PMDs may encounter them as a result but do not have to accept > the > negative types they did not generate. > > The method to generate them has not been specified yet. > > Planned types > ~~~~~~~~~~~~~ > > Pattern item types will be added as new protocols are implemented. > > Variable headers support through dedicated pattern items, for example in > order to match specific IPv4 options and IPv6 extension headers, these > would > be stacked behind IPv4/IPv6 items. > > Other action types are planned but not defined yet. These actions will add > the ability to alter matched packets in several ways, such as performing > encapsulation/decapsulation of tunnel headers on specific flows. > > .. raw:: pdf > > PageBreak > > Rules management > ---------------- > > A simple API with few functions is provided to fully manage flows. > > Each created flow rule is associated with an opaque, PMD-specific handle > pointer. The application is responsible for keeping it until the rule is > destroyed. > > Flows rules are represented by ``struct rte_flow`` objects. > > Validation > ~~~~~~~~~~ > > Given that expressing a definite set of device capabilities with this API is > not practical, a dedicated function is provided to check if a flow rule is > supported and can be created. > > :: > > int > rte_flow_validate(uint8_t port_id, > const struct rte_flow_attr *attr, > const struct rte_flow_pattern *pattern, > const struct rte_flow_actions *actions, > struct rte_flow_error *error); > > While this function has no effect on the target device, the flow rule is > validated against its current configuration state and the returned value > should be considered valid by the caller for that state only. > > The returned value is guaranteed to remain valid only as long as no > successful calls to rte_flow_create() or rte_flow_destroy() are made in the > meantime and no device parameter affecting flow rules in any way are > modified, due to possible collisions or resource limitations (although in > such cases ``EINVAL`` should not be returned). > > Arguments: > > - ``port_id``: port identifier of Ethernet device. > - ``attr``: flow rule attributes. > - ``pattern``: pattern specification. > - ``actions``: actions associated with the flow definition. > - ``error``: perform verbose error reporting if not NULL. > > Return value: > > - **0** if flow rule is valid and can be created. A negative errno value > otherwise (``rte_errno`` is also set), the following errors are defined. > - ``-ENOSYS``: underlying device does not support this functionality. > - ``-EINVAL``: unknown or invalid rule specification. > - ``-ENOTSUP``: valid but unsupported rule specification (e.g. partial > bit-masks are unsupported). > - ``-EEXIST``: collision with an existing rule. > - ``-ENOMEM``: not enough resources. > - ``-EBUSY``: action cannot be performed due to busy device resources, may > succeed if the affected queues or even the entire port are in a stopped > state (see ``rte_eth_dev_rx_queue_stop()`` and ``rte_eth_dev_stop()``). > > .. raw:: pdf > > PageBreak > > Creation > ~~~~~~~~ > > Creating a flow rule is similar to validating one, except the rule is > actually created and a handle returned. > > :: > > struct rte_flow * > rte_flow_create(uint8_t port_id, > const struct rte_flow_attr *attr, > const struct rte_flow_pattern *pattern, > const struct rte_flow_actions *actions, > struct rte_flow_error *error); > > Arguments: > > - ``port_id``: port identifier of Ethernet device. > - ``attr``: flow rule attributes. > - ``pattern``: pattern specification. > - ``actions``: actions associated with the flow definition. > - ``error``: perform verbose error reporting if not NULL. > > Return value: > > A valid handle in case of success, NULL otherwise and ``rte_errno`` is set > to the positive version of one of the error codes defined for > ``rte_flow_validate()``. > > Destruction > ~~~~~~~~~~~ > > Flow rules destruction is not automatic, and a queue or a port should not be > released if any are still attached to them. Applications must take care of > performing this step before releasing resources. > > :: > > int > rte_flow_destroy(uint8_t port_id, > struct rte_flow *flow, > struct rte_flow_error *error); > > > Failure to destroy a flow rule handle may occur when other flow rules > depend > on it, and destroying it would result in an inconsistent state. > > This function is only guaranteed to succeed if handles are destroyed in > reverse order of their creation. > > Arguments: > > - ``port_id``: port identifier of Ethernet device. > - ``flow``: flow rule handle to destroy. > - ``error``: perform verbose error reporting if not NULL. > > Return value: > > - **0** on success, a negative errno value otherwise and ``rte_errno`` is > set. > > .. raw:: pdf > > PageBreak > > Flush > ~~~~~ > > Convenience function to destroy all flow rule handles associated with a > port. They are released as with successive calls to ``rte_flow_destroy()``. > > :: > > int > rte_flow_flush(uint8_t port_id, > struct rte_flow_error *error); > > In the unlikely event of failure, handles are still considered destroyed and > no longer valid but the port must be assumed to be in an inconsistent state. > > Arguments: > > - ``port_id``: port identifier of Ethernet device. > - ``error``: perform verbose error reporting if not NULL. > > Return value: > > - **0** on success, a negative errno value otherwise and ``rte_errno`` is > set. > > Query > ~~~~~ > > Query an existing flow rule. > > This function allows retrieving flow-specific data such as counters. Data > is gathered by special actions which must be present in the flow rule > definition. > > :: > > int > rte_flow_query(uint8_t port_id, > struct rte_flow *flow, > enum rte_flow_action_type action, > void *data, > struct rte_flow_error *error); > > Arguments: > > - ``port_id``: port identifier of Ethernet device. > - ``flow``: flow rule handle to query. > - ``action``: action type to query. > - ``data``: pointer to storage for the associated query data type. > - ``error``: perform verbose error reporting if not NULL. > > Return value: > > - **0** on success, a negative errno value otherwise and ``rte_errno`` is > set. > > .. raw:: pdf > > PageBreak > > Verbose error reporting > ~~~~~~~~~~~~~~~~~~~~~~~ > > The defined *errno* values may not be accurate enough for users or > application developers who want to investigate issues related to flow rules > management. A dedicated error object is defined for this purpose:: > > enum rte_flow_error_type { > RTE_FLOW_ERROR_TYPE_NONE, /**< No error. */ > RTE_FLOW_ERROR_TYPE_UNDEFINED, /**< Cause is undefined. */ > RTE_FLOW_ERROR_TYPE_HANDLE, /**< Flow rule (handle). */ > RTE_FLOW_ERROR_TYPE_ATTR_GROUP, /**< Group field. */ > RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY, /**< Priority field. */ > RTE_FLOW_ERROR_TYPE_ATTR_INGRESS, /**< field. */ > RTE_FLOW_ERROR_TYPE_ATTR_EGRESS, /**< field. */ > RTE_FLOW_ERROR_TYPE_ATTR, /**< Attributes structure itself. */ > RTE_FLOW_ERROR_TYPE_PATTERN_MAX, /**< Pattern length (max field). > */ > RTE_FLOW_ERROR_TYPE_PATTERN_ITEM, /**< Specific pattern item. */ > RTE_FLOW_ERROR_TYPE_PATTERN, /**< Pattern structure itself. */ > RTE_FLOW_ERROR_TYPE_ACTION_MAX, /**< Number of actions (max > field). */ > RTE_FLOW_ERROR_TYPE_ACTION, /**< Specific action. */ > RTE_FLOW_ERROR_TYPE_ACTIONS, /**< Actions structure itself. */ > }; > > struct rte_flow_error { > enum rte_flow_error_type type; /**< Cause field and error types. */ > void *cause; /**< Object responsible for the error. */ > const char *message; /**< Human-readable error message. */ > }; > > Error type ``RTE_FLOW_ERROR_TYPE_NONE`` stands for no error, in which > case > the remaining fields can be ignored. Other error types describe the object > type pointed to by ``cause``. > > If non-NULL, ``cause`` points to the object responsible for the error. For a > flow rule, this may be a pattern item or an individual action. > > If non-NULL, ``message`` provides a human-readable error message. > > This object is normally allocated by applications and set by PMDs, the > message points to a constant string which does not need to be freed by the > application, however its pointer can be considered valid only as long as its > associated DPDK port remains configured. Closing the underlying device or > unloading the PMD invalidates it. > > .. raw:: pdf > > PageBreak > > PMD interface > ~~~~~~~~~~~~~ > > This specification focuses on the public-facing interface, which must be > fully defined from the start to avoid a re-design later as it is subject to > API and ABI versioning constraints. > > No such issue exists with the internal interface for use by poll-mode > drivers which can evolve independently, hence this section only outlines how > requests are processed by PMDs. > > Public functions are mapped more or less directly to PMD operation > callbacks, thus: > > - Public API functions do not process flow rules definitions at all before > calling PMD callbacks (no basic error checking, no validation > whatsoever). They only make sure these callbacks are non-NULL or return > the ``ENOSYS`` (function not supported) error. > > - DPDK does not keep track of flow rules definitions or flow rule objects > automatically. Applications may keep track of the former and must keep > track of the latter. PMDs may also do it for internal needs, however this > cannot be relied on by applications. > > The private interface will provide helper functions to perform common tasks > such as parsing, validating and keeping track of flow rule specifications to > avoid redundant code in PMDs and ease implementation. > > Its contents are currently largely undefined since at least one PMD > implementation is necessary first. PMD maintainers are encouraged to share > as much generic code as possible. > > .. raw:: pdf > > PageBreak > > Caveats > ------- > > - Flow rules are not maintained between successive port initializations. An > application exiting without releasing them and restarting must re-create > them from scratch. > > - API operations are synchronous and blocking (``EAGAIN`` cannot be > returned). > > - There is no provision for reentrancy/multi-thread safety, although nothing > should prevent different devices from being configured at the same > time. PMDs may protect their control path functions accordingly. > > - Stopping the data path (TX/RX) should not be necessary when managing > flow > rules. If this cannot be achieved naturally or with workarounds (such as > temporarily replacing the burst function pointers), an appropriate error > code must be returned (``EBUSY``). > > - PMDs, not applications, are responsible for maintaining flow rules > configuration when stopping and restarting a port or performing other > actions which may affect them. They can only be destroyed explicitly. > > For devices exposing multiple ports sharing global settings affected by flow > rules: > > - All ports under DPDK control must behave consistently, PMDs are > responsible for making sure that existing flow rules on a port are not > affected by other ports. > > - Ports not under DPDK control (unaffected or handled by other applications) > are user's responsibility. They may affect existing flow rules and cause > undefined behavior. PMDs aware of this may prevent flow rules creation > altogether in such cases. > > .. raw:: pdf > > PageBreak > > Compatibility > ------------- > > No known hardware implementation supports all the features described in > this > document. > > Unsupported features or combinations are not expected to be fully > emulated > in software by PMDs for performance reasons. Partially supported features > may be completed in software as long as hardware performs most of the > work > (such as queue redirection and packet recognition). > > However PMDs are expected to do their best to satisfy application requests > by working around hardware limitations as long as doing so does not affect > the behavior of existing flow rules. > > The following sections provide a few examples of such cases, they are based > on limitations built into the previous APIs. > > Global bit-masks > ~~~~~~~~~~~~~~~~ > > Each flow rule comes with its own, per-layer bit-masks, while hardware may > support only a single, device-wide bit-mask for a given layer type, so that > two IPv4 rules cannot use different bit-masks. > > The expected behavior in this case is that PMDs automatically configure > global bit-masks according to the needs of the first created flow rule. > > Subsequent rules are allowed only if their bit-masks match those, the > ``EEXIST`` error code should be returned otherwise. > > Unsupported layer types > ~~~~~~~~~~~~~~~~~~~~~~~ > > Many protocols can be simulated by crafting patterns with the `RAW`_ type. > > PMDs can rely on this capability to simulate support for protocols with > fixed headers not directly recognized by hardware. > > ``ANY`` pattern item > ~~~~~~~~~~~~~~~~~~~~ > > This pattern item stands for anything, which can be difficult to translate > to something hardware would understand, particularly if followed by more > specific types. > > Consider the following pattern: > > +---+--------------------------------+ > | 0 | ETHER | > +---+--------------------------------+ > | 1 | ANY (``min`` = 1, ``max`` = 1) | > +---+--------------------------------+ > | 2 | TCP | > +---+--------------------------------+ > > Knowing that TCP does not make sense with something other than IPv4 and > IPv6 > as L3, such a pattern may be translated to two flow rules instead: > > +---+--------------------+ > | 0 | ETHER | > +---+--------------------+ > | 1 | IPV4 (zeroed mask) | > +---+--------------------+ > | 2 | TCP | > +---+--------------------+ > > +---+--------------------+ > | 0 | ETHER | > +---+--------------------+ > | 1 | IPV6 (zeroed mask) | > +---+--------------------+ > | 2 | TCP | > +---+--------------------+ > > Note that as soon as a ANY rule covers several layers, this approach may > yield a large number of hidden flow rules. It is thus suggested to only > support the most common scenarios (anything as L2 and/or L3). > > .. raw:: pdf > > PageBreak > > Unsupported actions > ~~~~~~~~~~~~~~~~~~~ > > - When combined with a `QUEUE`_ action, packet counting (`COUNT`_) and > tagging (`MARK`_ or `FLAG`_) may be implemented in software as long as > the > target queue is used by a single rule. > > - A rule specifying both `DUP`_ + `QUEUE`_ may be translated to two hidden > rules combining `QUEUE`_ and `PASSTHRU`_. > > - When a single target queue is provided, `RSS`_ can also be implemented > through `QUEUE`_. > > Flow rules priority > ~~~~~~~~~~~~~~~~~~~ > > While it would naturally make sense, flow rules cannot be assumed to be > processed by hardware in the same order as their creation for several > reasons: > > - They may be managed internally as a tree or a hash table instead of a > list. > - Removing a flow rule before adding another one can either put the new > rule > at the end of the list or reuse a freed entry. > - Duplication may occur when packets are matched by several rules. > > For overlapping rules (particularly in order to use the `PASSTHRU`_ action) > predictable behavior is only guaranteed by using different priority levels. > > Priority levels are not necessarily implemented in hardware, or may be > severely limited (e.g. a single priority bit). > > For these reasons, priority levels may be implemented purely in software by > PMDs. > > - For devices expecting flow rules to be added in the correct order, PMDs > may destroy and re-create existing rules after adding a new one with > a higher priority. > > - A configurable number of dummy or empty rules can be created at > initialization time to save high priority slots for later. > > - In order to save priority levels, PMDs may evaluate whether rules are > likely to collide and adjust their priority accordingly. > > .. raw:: pdf > > PageBreak > > API migration > ============= > > Exhaustive list of deprecated filter types and how to convert them to > generic flow rules. > > ``MACVLAN`` to ``ETH`` ? ``VF``, ``PF`` > --------------------------------------- > > `MACVLAN`_ can be translated to a basic `ETH`_ flow rule with a `VF > (action)`_ or `PF (action)`_ terminating action. > > +------------------------------------+ > | MACVLAN | > +--------------------------+---------+ > | Pattern | Actions | > +===+=====+==========+=====+=========+ > | 0 | ETH | ``spec`` | any | VF, | > | | +----------+-----+ PF | > | | | ``mask`` | any | | > +---+-----+----------+-----+---------+ > > ``ETHERTYPE`` to ``ETH`` ? ``QUEUE``, ``DROP`` > ---------------------------------------------- > > `ETHERTYPE`_ is basically an `ETH`_ flow rule with `QUEUE`_ or `DROP`_ as > a terminating action. > > +------------------------------------+ > | ETHERTYPE | > +--------------------------+---------+ > | Pattern | Actions | > +===+=====+==========+=====+=========+ > | 0 | ETH | ``spec`` | any | QUEUE, | > | | +----------+-----+ DROP | > | | | ``mask`` | any | | > +---+-----+----------+-----+---------+ > > ``FLEXIBLE`` to ``RAW`` ? ``QUEUE`` > ----------------------------------- > > `FLEXIBLE`_ can be translated to one `RAW`_ pattern with `QUEUE`_ as the > terminating action and a defined priority level. > > +------------------------------------+ > | FLEXIBLE | > +--------------------------+---------+ > | Pattern | Actions | > +===+=====+==========+=====+=========+ > | 0 | RAW | ``spec`` | any | QUEUE | > | | +----------+-----+ | > | | | ``mask`` | any | | > +---+-----+----------+-----+---------+ > > ``SYN`` to ``TCP`` ? ``QUEUE`` > ------------------------------ > > `SYN`_ is a `TCP`_ rule with only the ``syn`` bit enabled and masked, and > `QUEUE`_ as the terminating action. > > Priority level can be set to simulate the high priority bit. > > +---------------------------------------------+ > | SYN | > +-----------------------------------+---------+ > | Pattern | Actions | > +===+======+==========+=============+=========+ > | 0 | ETH | ``spec`` | empty | QUEUE | > | | +----------+-------------+ | > | | | ``mask`` | empty | | > +---+------+----------+-------------+ | > | 1 | IPV4 | ``spec`` | empty | | > | | +----------+-------------+ | > | | | ``mask`` | empty | | > +---+------+----------+-------------+ | > | 2 | TCP | ``spec`` | ``syn`` = 1 | | > | | +----------+-------------+ | > | | | ``mask`` | ``syn`` = 1 | | > +---+------+----------+-------------+---------+ > > ``NTUPLE`` to ``IPV4``, ``TCP``, ``UDP`` ? ``QUEUE`` > ---------------------------------------------------- > > `NTUPLE`_ is similar to specifying an empty L2, `IPV4`_ as L3 with `TCP`_ or > `UDP`_ as L4 and `QUEUE`_ as the terminating action. > > A priority level can be specified as well. > > +---------------------------------------+ > | NTUPLE | > +-----------------------------+---------+ > | Pattern | Actions | > +===+======+==========+=======+=========+ > | 0 | ETH | ``spec`` | empty | QUEUE | > | | +----------+-------+ | > | | | ``mask`` | empty | | > +---+------+----------+-------+ | > | 1 | IPV4 | ``spec`` | any | | > | | +----------+-------+ | > | | | ``mask`` | any | | > +---+------+----------+-------+ | > | 2 | TCP, | ``spec`` | any | | > | | UDP +----------+-------+ | > | | | ``mask`` | any | | > +---+------+----------+-------+---------+ > > ``TUNNEL`` to ``ETH``, ``IPV4``, ``IPV6``, ``VXLAN`` (or other) ? ``QUEUE`` > --------------------------------------------------------------------------- > > `TUNNEL`_ matches common IPv4 and IPv6 L3/L4-based tunnel types. > > In the following table, `ANY`_ is used to cover the optional L4. > > +------------------------------------------------+ > | TUNNEL | > +--------------------------------------+---------+ > | Pattern | Actions | > +===+=========+==========+=============+=========+ > | 0 | ETH | ``spec`` | any | QUEUE | > | | +----------+-------------+ | > | | | ``mask`` | any | | > +---+---------+----------+-------------+ | > | 1 | IPV4, | ``spec`` | any | | > | | IPV6 +----------+-------------+ | > | | | ``mask`` | any | | > +---+---------+----------+-------------+ | > | 2 | ANY | ``spec`` | ``min`` = 0 | | > | | | +-------------+ | > | | | | ``max`` = 0 | | > | | +----------+-------------+ | > | | | ``mask`` | N/A | | > +---+---------+----------+-------------+ | > | 3 | VXLAN, | ``spec`` | any | | > | | GENEVE, +----------+-------------+ | > | | TEREDO, | ``mask`` | any | | > | | NVGRE, | | | | > | | GRE, | | | | > | | ... | | | | > +---+---------+----------+-------------+---------+ > > .. raw:: pdf > > PageBreak > > ``FDIR`` to most item types ? ``QUEUE``, ``DROP``, ``PASSTHRU`` > --------------------------------------------------------------- > > `FDIR`_ is more complex than any other type, there are several methods to > emulate its functionality. It is summarized for the most part in the table > below. > > A few features are intentionally not supported: > > - The ability to configure the matching input set and masks for the entire > device, PMDs should take care of it automatically according to the > requested flow rules. > > For example if a device supports only one bit-mask per protocol type, > source/address IPv4 bit-masks can be made immutable by the first created > rule. Subsequent IPv4 or TCPv4 rules can only be created if they are > compatible. > > Note that only protocol bit-masks affected by existing flow rules are > immutable, others can be changed later. They become mutable again after > the related flow rules are destroyed. > > - Returning four or eight bytes of matched data when using flex bytes > filtering. Although a specific action could implement it, it conflicts > with the much more useful 32 bits tagging on devices that support it. > > - Side effects on RSS processing of the entire device. Flow rules that > conflict with the current device configuration should not be > allowed. Similarly, device configuration should not be allowed when it > affects existing flow rules. > > - Device modes of operation. "none" is unsupported since filtering cannot be > disabled as long as a flow rule is present. > > - "MAC VLAN" or "tunnel" perfect matching modes should be automatically > set > according to the created flow rules. > > - Signature mode of operation is not defined but could be handled through a > specific item type if needed. > > +----------------------------------------------+ > | FDIR | > +---------------------------------+------------+ > | Pattern | Actions | > +===+============+==========+=====+============+ > | 0 | ETH, | ``spec`` | any | QUEUE, | > | | RAW +----------+-----+ DROP, | > | | | ``mask`` | any | PASSTHRU | > +---+------------+----------+-----+------------+ > | 1 | IPV4, | ``spec`` | any | MARK | > | | IPV6 +----------+-----+ (optional) | > | | | ``mask`` | any | | > +---+------------+----------+-----+ | > | 2 | TCP, | ``spec`` | any | | > | | UDP, +----------+-----+ | > | | SCTP | ``mask`` | any | | > +---+------------+----------+-----+ | > | 3 | VF, | ``spec`` | any | | > | | PF +----------+-----+ | > | | (optional) | ``mask`` | any | | > +---+------------+----------+-----+------------+ > > .. raw:: pdf > > PageBreak > > ``HASH`` > ~~~~~~~~ > > There is no counterpart to this filter type because it translates to a > global device setting instead of a pattern item. Device settings are > automatically set according to the created flow rules. > > ``L2_TUNNEL`` to ``VOID`` ? ``VXLAN`` (or others) > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > All packets are matched. This type alters incoming packets to encapsulate > them in a chosen tunnel type, optionally redirect them to a VF as well. > > The destination pool for tag based forwarding can be emulated with other > flow rules using `DUP`_ as the action. > > +----------------------------------------+ > | L2_TUNNEL | > +---------------------------+------------+ > | Pattern | Actions | > +===+======+==========+=====+============+ > | 0 | VOID | ``spec`` | N/A | VXLAN, | > | | | | | GENEVE, | > | | | | | ... | > | | +----------+-----+------------+ > | | | ``mask`` | N/A | VF | > | | | | | (optional) | > +---+------+----------+-----+------------+ > > .. raw:: pdf > > PageBreak > > Future evolutions > ================= > > - Describing dedicated testpmd commands to control and validate this API. > > - A method to optimize generic flow rules with specific pattern items and > action types generated on the fly by PMDs. DPDK will assign negative > numbers to these in order to not collide with the existing types. See > `Negative types`_. > > - Adding specific egress pattern items and actions as described in `Traffic > direction`_. > > - Optional software fallback when PMDs are unable to handle requested flow > rules so applications do not have to implement their own. > > - Ranges in addition to bit-masks. Ranges are more generic in many ways as > they interpret values. For instance only ranges make sense to cover > several TCP or UDP ports. These will probably be defined on a pattern item > basis. > > -------- > > Adrien Mazarguil (1): > ethdev: introduce generic flow API > > lib/librte_ether/Makefile | 2 + > lib/librte_ether/rte_flow.h | 941 > +++++++++++++++++++++++++++++++++++++++ > 2 files changed, 943 insertions(+) > create mode 100644 lib/librte_ether/rte_flow.h > > -- > 2.1.4