Hi Slava,
> -----Original Message----- > From: dev <dev-boun...@dpdk.org> On Behalf Of Viacheslav Ovsiienko > Sent: Monday, October 11, 2021 9:15 PM > Subject: [dpdk-dev] [PATCH v3 1/5] ethdev: introduce configurable flexible > item > > 1. Introduction and Retrospective > > Nowadays the networks are evolving fast and wide, the network structures are > getting more and more > complicated, the new application areas are emerging. To address these > challenges the new network > protocols are continuously being developed, considered by technical > communities, adopted by industry > and, eventually implemented in hardware and software. The DPDK framework > follows the common > trends and if we bother to glance at the RTE Flow API header we see the > multiple new items were > introduced during the last years since the initial release. > > The new protocol adoption and implementation process is not straightforward > and takes time, the new > protocol passes development, consideration, adoption, and implementation > phases. The industry tries to > mitigate and address the forthcoming network protocols, for example, many > hardware vendors are > implementing flexible and configurable network protocol parsers. As DPDK > developers, could we > anticipate the near future in the same fashion and introduce the similar > flexibility in RTE Flow API? > > Let's check what we already have merged in our project, and we see the nice > raw item > (rte_flow_item_raw). At the first glance, it looks superior and we can try to > implement a flow matching on > the header of some relatively new tunnel protocol, say on the GENEVE header > with variable length > options. And, under further consideration, we run into the raw item > limitations: > > - only fixed size network header can be represented > - the entire network header pattern of fixed format > (header field offsets are fixed) must be provided > - the search for patterns is not robust (the wrong matches > might be triggered), and actually is not supported > by existing PMDs > - no explicitly specified relations with preceding > and following items > - no tunnel hint support > > As the result, implementing the support for tunnel protocols like > aforementioned GENEVE with variable > extra protocol option with flow raw item becomes very complicated and would > require multiple flows and > multiple raw items chained in the same flow (by the way, there is no support > found for chained raw items > in implemented drivers). > > This RFC introduces the dedicated flex item (rte_flow_item_flex) to handle > matches with existing and new > network protocol headers in a unified fashion. > > 2. Flex Item Life Cycle > > Let's assume there are the requirements to support the new network protocol > with RTE Flows. What is > given within protocol > specification: > > - header format > - header length, (can be variable, depending on options) > - potential presence of extra options following or included > in the header the header > - the relations with preceding protocols. For example, > the GENEVE follows UDP, eCPRI can follow either UDP > or L2 header > - the relations with following protocols. For example, > the next layer after tunnel header can be L2 or L3 > - whether the new protocol is a tunnel and the header > is a splitting point between outer and inner layers > > The supposed way to operate with flex item: > > - application defines the header structures according to > protocol specification > > - application calls rte_flow_flex_item_create() with desired > configuration according to the protocol specification, it > creates the flex item object over specified ethernet device > and prepares PMD and underlying hardware to handle flex > item. On item creation call PMD backing the specified > ethernet device returns the opaque handle identifying > the object has been created > > - application uses the rte_flow_item_flex with obtained handle > in the flows, the values/masks to match with fields in the > header are specified in the flex item per flow as for regular > items (except that pattern buffer combines all fields) > > - flows with flex items match with packets in a regular fashion, > the values and masks for the new protocol header match are > taken from the flex items in the flows > > - application destroys flows with flex items > > - application calls rte_flow_flex_item_release() as part of > ethernet device API and destroys the flex item object in > PMD and releases the engaged hardware resources > > 3. Flex Item Structure > > The flex item structure is intended to be used as part of the flow pattern > like regular RTE flow items and > provides the mask and value to match with fields of the protocol item was > configured for. > > struct rte_flow_item_flex { > void *handle; > uint32_t length; > const uint8_t* pattern; > }; > > The handle is some opaque object maintained on per device basis by underlying > driver. > > The protocol header fields are considered as bit fields, all offsets and > widths are expressed in bits. The > pattern is the buffer containing the bit concatenation of all the fields > presented at item configuration time, > in the same order and same amount. If byte boundary alignment is needed an > application can use a > dummy type field, this is just some kind of gap filler. > > The length field specifies the pattern buffer length in bytes and is needed > to allow rte_flow_copy() > operations. The approach of multiple pattern pointers and lengths (per field) > was considered and found > clumsy - it seems to be much suitable for the application to maintain the > single structure within the single > pattern buffer. > > 4. Flex Item Configuration > > The flex item configuration consists of the following parts: > > - header field descriptors: > - next header > - next protocol > - sample to match > - input link descriptors > - output link descriptors > > The field descriptors tell the driver and hardware what data should be > extracted from the packet and then > control the packet handling in the flow engine. Besides this, sample fields > can be presented to match with > patterns in the flows. Each field is a bit pattern. > It has width, offset from the header beginning, mode of offset calculation, > and offset related parameters. > > The next header field is special, no data are actually taken from the packet, > but its offset is used as a > pointer to the next header in the packet, in other words the next header > offset specifies the size of the > header being parsed by flex item. > > There is one more special field - next protocol, it specifies where the next > protocol identifier is contained > and packet data sampled from this field will be used to determine the next > protocol header type to > continue packet parsing. The next protocol field is like eth_type field in > MAC2, or proto field in IPv4/v6 > headers. > > The sample fields are used to represent the data be sampled from the packet > and then matched with > established flows. > > There are several methods supposed to calculate field offset in runtime > depending on configuration and > packet content: > > - FIELD_MODE_FIXED - fixed offset. The bit offset from > header beginning is permanent and defined by field_base > configuration parameter. > > - FIELD_MODE_OFFSET - the field bit offset is extracted > from other header field (indirect offset field). The > resulting field offset to match is calculated from as: > > field_base + (*offset_base & offset_mask) << offset_shift > > This mode is useful to sample some extra options following > the main header with field containing main header length. > Also, this mode can be used to calculate offset to the > next protocol header, for example - IPv4 header contains > the 4-bit field with IPv4 header length expressed in dwords. > One more example - this mode would allow us to skip GENEVE > header variable length options. > > - FIELD_MODE_BITMASK - the field bit offset is extracted > from other header field (indirect offset field), the latter > is considered as bitmask containing some number of one bits, > the resulting field offset to match is calculated as: > > field_base + bitcount(*offset_base & offset_mask) << offset_shift > > This mode would be useful to skip the GTP header and its > extra options with specified flags. > > - FIELD_MODE_DUMMY - dummy field, optionally used for byte > boundary alignment in pattern. Pattern mask and data are > ignored in the match. All configuration parameters besides > field size and offset are ignored. > > Note: "*" - means the indirect field offset is calculated > and actual data are extracted from the packet by this > offset (like data are fetched by pointer *p from memory). > > The offset mode list can be extended by vendors according to hardware > supported options. > > The input link configuration section tells the driver after what protocols > and at what conditions the flex > item can follow. > Input link specified the preceding header pattern, for example for GENEVE it > can be UDP item specifying > match on destination port with value 6081. The flex item can follow multiple > header types and multiple > input links should be specified. At flow creation time the item with one of > the input link types should > precede the flex item and driver will select the correct flex item settings, > depending on the actual flow > pattern. > > The output link configuration section tells the driver how to continue packet > parsing after the flex item > protocol. > If multiple protocols can follow the flex item header the flex item should > contain the field with the next > protocol identifier and the parsing will be continued depending on the data > contained in this field in the > actual packet. > > The flex item fields can participate in RSS hash calculation, the dedicated > flag is present in the field > description to specify what fields should be provided for hashing. > > 5. Flex Item Chaining > > If there are multiple protocols supposed to be supported with flex items in > chained fashion - two or more > flex items within the same flow and these ones might be neighbors in the > pattern, it means the flex items > are mutual referencing. In this case, the item that occurred first should be > created with empty output link > list or with the list including existing items, and then the second flex item > should be created referencing the > first flex item as input arc, drivers should adjust the item confgiuration. > > Also, the hardware resources used by flex items to handle the packet can be > limited. If there are multiple > flex items that are supposed to be used within the same flow it would be nice > to provide some hint for the > driver that these two or more flex items are intended for simultaneous usage. > The fields of items should be assigned with hint indices and these indices > from two or more flex items > supposed to be provided within the same flow should be the same as well. In > other words, the field hint > index specifies the group of fields that can be matched simultaneously within > a single flow. If hint indices > are specified, the driver will try to engage not overlapping hardware > resources and provide independent > handling of the field groups with unique indices. If the hint index is zero > the driver assigns resources on its > own. > > 6. Example of New Protocol Handling > > Let's suppose we have the requirements to handle the new tunnel protocol that > follows UDP header with > destination port 0xFADE and is followed by MAC header. Let the new protocol > header format be like this: > > struct new_protocol_header { > rte_be32 header_length; /* length in dwords, including options */ > rte_be32 specific0; /* some protocol data, no intention */ > rte_be32 specific1; /* to match in flows on these fields */ > rte_be32 crucial; /* data of interest, match is needed */ > rte_be32 options[0]; /* optional protocol data, variable length */ > }; > > The supposed flex item configuration: > > struct rte_flow_item_flex_field field0 = { > .field_mode = FIELD_MODE_DUMMY, /* Affects match pattern only */ > .field_size = 96, /* three dwords from the beginning */ > }; > struct rte_flow_item_flex_field field1 = { > .field_mode = FIELD_MODE_FIXED, > .field_size = 32, /* Field size is one dword */ > .field_base = 96, /* Skip three dwords from the beginning */ > }; > struct rte_flow_item_udp spec0 = { > .hdr = { > .dst_port = RTE_BE16(0xFADE), > } > }; > struct rte_flow_item_udp mask0 = { > .hdr = { > .dst_port = RTE_BE16(0xFFFF), > } > }; > struct rte_flow_item_flex_link link0 = { > .item = { > .type = RTE_FLOW_ITEM_TYPE_UDP, > .spec = &spec0, > .mask = &mask0, > }; > > struct rte_flow_item_flex_conf conf = { > .next_header = { > .tunnel = FLEX_TUNNEL_MODE_SINGLE, > .field_mode = FIELD_MODE_OFFSET, > .field_base = 0, > .offset_base = 0, > .offset_mask = 0xFFFFFFFF, > .offset_shift = 2 /* Expressed in dwords, shift left by 2 */ > }, > .sample = { > &field0, > &field1, > }, > .nb_samples = 2, > .input_link[0] = &link0, > .nb_inputs = 1 > }; > > Let's suppose we have created the flex item successfully, and PMD returned > the handle 0x123456789A. > We can use the following item pattern to match the crucial field in the > packet with value 0x00112233: > > struct new_protocol_header spec_pattern = > { > .crucial = RTE_BE32(0x00112233), > }; > struct new_protocol_header mask_pattern = > { > .crucial = RTE_BE32(0xFFFFFFFF), > }; > struct rte_flow_item_flex spec_flex = { > .handle = 0x123456789A > .length = sizeiof(struct new_protocol_header), > .pattern = &spec_pattern, > }; > struct rte_flow_item_flex mask_flex = { > .length = sizeof(struct new_protocol_header), > .pattern = &mask_pattern, > }; > struct rte_flow_item item_to_match = { > .type = RTE_FLOW_ITEM_TYPE_FLEX, > .spec = &spec_flex, > .mask = &mask_flex, > }; > > Signed-off-by: Viacheslav Ovsiienko <viachesl...@nvidia.com> > --- Acked-by: Ori Kam <or...@nvidia.com> Thanks, Ori