Hi Andrey, On 07/06/2016 01:59 PM, Chilikin, Andrey wrote: > Hi Oliver, > >> -----Original Message----- >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier MATZ >> Sent: Wednesday, July 6, 2016 8:43 AM >> To: Liang, Cunming <cunming.liang at intel.com>; dev at dpdk.org >> Subject: Re: [dpdk-dev] [PATCH 05/18] mbuf: add function to get packet type >> from data >> >> Hi Cunming, >> >> On 07/06/2016 08:44 AM, Liang, Cunming wrote: >>> Hi Olivier, >>> >>> On 7/5/2016 11:41 PM, Olivier Matz wrote: >>>> Introduce the function rte_pktmbuf_get_ptype() that parses a mbuf and >>>> returns its packet type. For now, the following packet types are >>>> parsed: >>>> L2: Ether >>>> L3: IPv4, IPv6 >>>> L4: TCP, UDP, SCTP >>>> >>>> The goal here is to provide a reference implementation for packet >>>> type parsing. This function will be used by testpmd in next commits, >>>> allowing to compare its result with the value given by the hardware. >>>> >>>> This function will also be useful when implementing Rx offload >>>> support in virtio pmd. Indeed, the virtio protocol gives the csum >>>> start and offset, but it does not give the L4 protocol nor it tells >>>> if the checksum is relevant for inner or outer. This information has >>>> to be known to properly set the ol_flags in mbuf. >>>> >>>> Signed-off-by: Didier Pallard <didier.pallard at 6wind.com> >>>> Signed-off-by: Jean Dao <jean.dao at 6wind.com> >>>> Signed-off-by: Olivier Matz <olivier.matz at 6wind.com> >>>> --- >>>> doc/guides/rel_notes/release_16_11.rst | 5 + >>>> lib/librte_mbuf/Makefile | 5 +- >>>> lib/librte_mbuf/rte_mbuf_ptype.c | 234 >>>> +++++++++++++++++++++++++++++++++ >>>> lib/librte_mbuf/rte_mbuf_ptype.h | 43 ++++++ >>>> lib/librte_mbuf/rte_mbuf_version.map | 1 + >>>> 5 files changed, 286 insertions(+), 2 deletions(-) >>>> create mode 100644 lib/librte_mbuf/rte_mbuf_ptype.c >>>> >>>> [...] >>>> + >>>> +/* parse mbuf data to get packet type */ uint32_t >>>> +rte_pktmbuf_get_ptype(const struct rte_mbuf *m, >>>> + struct rte_mbuf_hdr_lens *hdr_lens) { >>>> + struct rte_mbuf_hdr_lens local_hdr_lens; >>>> + const struct ether_hdr *eh; >>>> + struct ether_hdr eh_copy; >>>> + uint32_t pkt_type = RTE_PTYPE_L2_ETHER; >>>> + uint32_t off = 0; >>>> + uint16_t proto; >>>> + >>>> + if (hdr_lens == NULL) >>>> + hdr_lens = &local_hdr_lens; >>>> + >>>> + eh = rte_pktmbuf_read(m, off, sizeof(*eh), &eh_copy); >>>> + if (unlikely(eh == NULL)) >>>> + return 0; >>>> + proto = eh->ether_type; >>>> + off = sizeof(*eh); >>>> + hdr_lens->l2_len = off; >>>> + >>>> + if (proto == rte_cpu_to_be_16(ETHER_TYPE_IPv4)) { >>>> + const struct ipv4_hdr *ip4h; >>>> + struct ipv4_hdr ip4h_copy; >>>> + >>>> + ip4h = rte_pktmbuf_read(m, off, sizeof(*ip4h), &ip4h_copy); >>>> + if (unlikely(ip4h == NULL)) >>>> + return pkt_type; >>>> + >>>> + pkt_type |= ptype_l3_ip(ip4h->version_ihl); >>>> + hdr_lens->l3_len = ip4_hlen(ip4h); >>>> + off += hdr_lens->l3_len; >>>> + if (ip4h->fragment_offset & >>>> + rte_cpu_to_be_16(IPV4_HDR_OFFSET_MASK | >>>> + IPV4_HDR_MF_FLAG)) { >>>> + pkt_type |= RTE_PTYPE_L4_FRAG; >>>> + hdr_lens->l4_len = 0; >>>> + return pkt_type; >>>> + } >>>> + proto = ip4h->next_proto_id; >>>> + pkt_type |= ptype_l4(proto); >>>> + } else if (proto == rte_cpu_to_be_16(ETHER_TYPE_IPv6)) { >>>> + const struct ipv6_hdr *ip6h; >>>> + struct ipv6_hdr ip6h_copy; >>>> + int frag = 0; >>>> + >>>> + ip6h = rte_pktmbuf_read(m, off, sizeof(*ip6h), &ip6h_copy); >>>> + if (unlikely(ip6h == NULL)) >>>> + return pkt_type; >>>> + >>>> + proto = ip6h->proto; >>>> + hdr_lens->l3_len = sizeof(*ip6h); >>>> + off += hdr_lens->l3_len; >>>> + pkt_type |= ptype_l3_ip6(proto); >>>> + if ((pkt_type & RTE_PTYPE_L3_MASK) == RTE_PTYPE_L3_IPV6_EXT) { >>>> + proto = skip_ip6_ext(proto, m, &off, &frag); >>>> + hdr_lens->l3_len = off - hdr_lens->l2_len; >>>> + } >>>> + if (proto == 0) >>>> + return pkt_type; >>>> + if (frag) { >>>> + pkt_type |= RTE_PTYPE_L4_FRAG; >>>> + hdr_lens->l4_len = 0; >>>> + return pkt_type; >>>> + } >>>> + pkt_type |= ptype_l4(proto); >>>> + } >>>> + >>>> + if ((pkt_type & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP) { >>>> + hdr_lens->l4_len = sizeof(struct udp_hdr); >>>> + } else if ((pkt_type & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP) { >>>> + const struct tcp_hdr *th; >>>> + struct tcp_hdr th_copy; >>>> + >>>> + th = rte_pktmbuf_read(m, off, sizeof(*th), &th_copy); >>>> + if (unlikely(th == NULL)) >>>> + return pkt_type & (RTE_PTYPE_L2_MASK | >>>> + RTE_PTYPE_L3_MASK); >>>> + hdr_lens->l4_len = (th->data_off & 0xf0) >> 2; >>>> + } else if ((pkt_type & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP) { >>>> + hdr_lens->l4_len = sizeof(struct sctp_hdr); >>>> + } else { >>>> + hdr_lens->l4_len = 0; >>>> + } >>>> + >>>> + return pkt_type; >>>> +} >>>> diff --git a/lib/librte_mbuf/rte_mbuf_ptype.h >>>> b/lib/librte_mbuf/rte_mbuf_ptype.h >>>> index 4a34678..f468520 100644 >>>> --- a/lib/librte_mbuf/rte_mbuf_ptype.h >>>> +++ b/lib/librte_mbuf/rte_mbuf_ptype.h >>>> @@ -545,6 +545,49 @@ extern "C" { >>>> RTE_PTYPE_INNER_L3_MASK | \ >>>> RTE_PTYPE_INNER_L4_MASK)) >>>> +struct rte_mbuf; >>>> + >>>> +/** >>>> + * Structure containing header lengths associated to a packet. >>>> + */ >>>> +struct rte_mbuf_hdr_lens { >>>> + uint8_t l2_len; >>>> + uint8_t l3_len; >>>> + uint8_t l4_len; >>>> + uint8_t tunnel_len; >>>> + uint8_t inner_l2_len; >>>> + uint8_t inner_l3_len; >>>> + uint8_t inner_l4_len; >>>> +}; >>> [LC] The header parsing graph usually is not unique. The definition >>> maybe nice for the basic IP and L4 tunnel. >>> However it can't scale out to other cases, e.g. qinq, mac-in-mac, mpls >>> l2/l3 tunnel. >>> The parsing logic of "rte_pktmbuf_get_ptype()" and the definition of >>> "struct rte_mbuf_hdr_lens" consist a pair for one specific parser scheme. >>> In this case, the fixed function is to support below. >>> >>> + * Supported packet types are: >>> + * L2: Ether >>> + * L3: IPv4, IPv6 >>> + * L4: TCP, UDP, SCTP >>> >>> Of course, it can add more packet type detection logic in future. But >>> the more support, the higher the cost. >>> >>> One of the alternative way is to allow registering parser pair. APP >>> decides to choose the predefined scheme(by DPDK LIB), or to >>> self-define the parsing logic. >>> In this way, the scheme can do some assumption for the specific case >>> and ignore some useless graph detection. >>> In addition, besides the SW parser, the HW parser(identified by >>> packet_type in mbuf) can be turn on/off by leveraging the same manner. >> >> Sorry, I'm not sure I'm fully getting what you are saying. If I understand >> well, >> you would like to have something more flexible that supports the >> registration of >> protocol to be recognized? >> >> I'm not sure having a function with a dynamic registration method would >> really >> increase the performance compared to a static complete function. >> Actually, we will never support a tons of protocols since each layer packet >> type >> is 4 bits, and since it requires that at least one hw supports it. > > This patch will be very useful as a reference implementation, but it also > highlights an issue with the current implementation of packet types reporting > by HW and SW - as you just mentioned there are only 4 bits per each layer. As > these 4 bit are used as a enumeration it is impossible to reports multiple > headers located on the same layer. MPLS is one example, different packets > could have different numbers of MPLS labels, but it is impossible to report > using current packet_type structure. > > It is possible, however, to program HW to report user (application) specific > packet types. For example, for IPoMPLS with one MPLS label, HW will report > packet type A, but for IPoMPLS with two MPLS labels HW will reports packet > type B. In this case, instead of defining and supporting tons of statically > defined (or enumerated) protocol headers combinations, application will > register packet types it expects from HW in addition to standard packet > types. At the moment we have high bits of packet_type reserved, so one > possible solution would be to use the highest bit to indicate that this is > user defined packet_type, specific to the application. Then it could be used > with HW and with SW parser. For example, packet_type 0x8000000A is IPoMPLS > with one MPLS label, 0x8000000B is IPoMPLS with two MPLS labels and so on.
Thank you for the explanation. From your description, I wonder if the flow director API recently [1] proposed by Adrien wouldn't solve this issue? [1] http://dpdk.org/ml/archives/dev/2016-July/043365.html Regards, Olivier