Hi Oliver, > -----Original Message----- > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > Sent: Wednesday, July 6, 2016 1:09 PM > To: Chilikin, Andrey <andrey.chilikin at intel.com>; Liang, Cunming > <cunming.liang at intel.com>; dev at dpdk.org > Cc: Ananyev, Konstantin <konstantin.ananyev at intel.com> > Subject: Re: [dpdk-dev] [PATCH 05/18] mbuf: add function to get packet type > from data > > Hi Andrey, > > On 07/06/2016 01:59 PM, Chilikin, Andrey wrote: > > Hi Oliver, > > > >> -----Original Message----- > >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier MATZ > >> Sent: Wednesday, July 6, 2016 8:43 AM > >> To: Liang, Cunming <cunming.liang at intel.com>; dev at dpdk.org > >> Subject: Re: [dpdk-dev] [PATCH 05/18] mbuf: add function to get > >> packet type from data > >> > >> Hi Cunming, > >> > >> On 07/06/2016 08:44 AM, Liang, Cunming wrote: > >>> Hi Olivier, > >>> > >>> On 7/5/2016 11:41 PM, Olivier Matz wrote: > >>>> Introduce the function rte_pktmbuf_get_ptype() that parses a mbuf > >>>> and returns its packet type. For now, the following packet types > >>>> are > >>>> parsed: > >>>> L2: Ether > >>>> L3: IPv4, IPv6 > >>>> L4: TCP, UDP, SCTP > >>>> > >>>> The goal here is to provide a reference implementation for packet > >>>> type parsing. This function will be used by testpmd in next > >>>> commits, allowing to compare its result with the value given by the > hardware. > >>>> > >>>> This function will also be useful when implementing Rx offload > >>>> support in virtio pmd. Indeed, the virtio protocol gives the csum > >>>> start and offset, but it does not give the L4 protocol nor it tells > >>>> if the checksum is relevant for inner or outer. This information > >>>> has to be known to properly set the ol_flags in mbuf. > >>>> > >>>> Signed-off-by: Didier Pallard <didier.pallard at 6wind.com> > >>>> Signed-off-by: Jean Dao <jean.dao at 6wind.com> > >>>> Signed-off-by: Olivier Matz <olivier.matz at 6wind.com> > >>>> --- > >>>> doc/guides/rel_notes/release_16_11.rst | 5 + > >>>> lib/librte_mbuf/Makefile | 5 +- > >>>> lib/librte_mbuf/rte_mbuf_ptype.c | 234 > >>>> +++++++++++++++++++++++++++++++++ > >>>> lib/librte_mbuf/rte_mbuf_ptype.h | 43 ++++++ > >>>> lib/librte_mbuf/rte_mbuf_version.map | 1 + > >>>> 5 files changed, 286 insertions(+), 2 deletions(-) > >>>> create mode 100644 lib/librte_mbuf/rte_mbuf_ptype.c > >>>> > >>>> [...] > >>>> + > >>>> +/* parse mbuf data to get packet type */ uint32_t > >>>> +rte_pktmbuf_get_ptype(const struct rte_mbuf *m, > >>>> + struct rte_mbuf_hdr_lens *hdr_lens) { > >>>> + struct rte_mbuf_hdr_lens local_hdr_lens; > >>>> + const struct ether_hdr *eh; > >>>> + struct ether_hdr eh_copy; > >>>> + uint32_t pkt_type = RTE_PTYPE_L2_ETHER; > >>>> + uint32_t off = 0; > >>>> + uint16_t proto; > >>>> + > >>>> + if (hdr_lens == NULL) > >>>> + hdr_lens = &local_hdr_lens; > >>>> + > >>>> + eh = rte_pktmbuf_read(m, off, sizeof(*eh), &eh_copy); > >>>> + if (unlikely(eh == NULL)) > >>>> + return 0; > >>>> + proto = eh->ether_type; > >>>> + off = sizeof(*eh); > >>>> + hdr_lens->l2_len = off; > >>>> + > >>>> + if (proto == rte_cpu_to_be_16(ETHER_TYPE_IPv4)) { > >>>> + const struct ipv4_hdr *ip4h; > >>>> + struct ipv4_hdr ip4h_copy; > >>>> + > >>>> + ip4h = rte_pktmbuf_read(m, off, sizeof(*ip4h), &ip4h_copy); > >>>> + if (unlikely(ip4h == NULL)) > >>>> + return pkt_type; > >>>> + > >>>> + pkt_type |= ptype_l3_ip(ip4h->version_ihl); > >>>> + hdr_lens->l3_len = ip4_hlen(ip4h); > >>>> + off += hdr_lens->l3_len; > >>>> + if (ip4h->fragment_offset & > >>>> + rte_cpu_to_be_16(IPV4_HDR_OFFSET_MASK | > >>>> + IPV4_HDR_MF_FLAG)) { > >>>> + pkt_type |= RTE_PTYPE_L4_FRAG; > >>>> + hdr_lens->l4_len = 0; > >>>> + return pkt_type; > >>>> + } > >>>> + proto = ip4h->next_proto_id; > >>>> + pkt_type |= ptype_l4(proto); > >>>> + } else if (proto == rte_cpu_to_be_16(ETHER_TYPE_IPv6)) { > >>>> + const struct ipv6_hdr *ip6h; > >>>> + struct ipv6_hdr ip6h_copy; > >>>> + int frag = 0; > >>>> + > >>>> + ip6h = rte_pktmbuf_read(m, off, sizeof(*ip6h), &ip6h_copy); > >>>> + if (unlikely(ip6h == NULL)) > >>>> + return pkt_type; > >>>> + > >>>> + proto = ip6h->proto; > >>>> + hdr_lens->l3_len = sizeof(*ip6h); > >>>> + off += hdr_lens->l3_len; > >>>> + pkt_type |= ptype_l3_ip6(proto); > >>>> + if ((pkt_type & RTE_PTYPE_L3_MASK) == RTE_PTYPE_L3_IPV6_EXT) > { > >>>> + proto = skip_ip6_ext(proto, m, &off, &frag); > >>>> + hdr_lens->l3_len = off - hdr_lens->l2_len; > >>>> + } > >>>> + if (proto == 0) > >>>> + return pkt_type; > >>>> + if (frag) { > >>>> + pkt_type |= RTE_PTYPE_L4_FRAG; > >>>> + hdr_lens->l4_len = 0; > >>>> + return pkt_type; > >>>> + } > >>>> + pkt_type |= ptype_l4(proto); > >>>> + } > >>>> + > >>>> + if ((pkt_type & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP) { > >>>> + hdr_lens->l4_len = sizeof(struct udp_hdr); > >>>> + } else if ((pkt_type & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP) { > >>>> + const struct tcp_hdr *th; > >>>> + struct tcp_hdr th_copy; > >>>> + > >>>> + th = rte_pktmbuf_read(m, off, sizeof(*th), &th_copy); > >>>> + if (unlikely(th == NULL)) > >>>> + return pkt_type & (RTE_PTYPE_L2_MASK | > >>>> + RTE_PTYPE_L3_MASK); > >>>> + hdr_lens->l4_len = (th->data_off & 0xf0) >> 2; > >>>> + } else if ((pkt_type & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP) { > >>>> + hdr_lens->l4_len = sizeof(struct sctp_hdr); > >>>> + } else { > >>>> + hdr_lens->l4_len = 0; > >>>> + } > >>>> + > >>>> + return pkt_type; > >>>> +} > >>>> diff --git a/lib/librte_mbuf/rte_mbuf_ptype.h > >>>> b/lib/librte_mbuf/rte_mbuf_ptype.h > >>>> index 4a34678..f468520 100644 > >>>> --- a/lib/librte_mbuf/rte_mbuf_ptype.h > >>>> +++ b/lib/librte_mbuf/rte_mbuf_ptype.h > >>>> @@ -545,6 +545,49 @@ extern "C" { > >>>> RTE_PTYPE_INNER_L3_MASK | \ > >>>> RTE_PTYPE_INNER_L4_MASK)) > >>>> +struct rte_mbuf; > >>>> + > >>>> +/** > >>>> + * Structure containing header lengths associated to a packet. > >>>> + */ > >>>> +struct rte_mbuf_hdr_lens { > >>>> + uint8_t l2_len; > >>>> + uint8_t l3_len; > >>>> + uint8_t l4_len; > >>>> + uint8_t tunnel_len; > >>>> + uint8_t inner_l2_len; > >>>> + uint8_t inner_l3_len; > >>>> + uint8_t inner_l4_len; > >>>> +}; > >>> [LC] The header parsing graph usually is not unique. The definition > >>> maybe nice for the basic IP and L4 tunnel. > >>> However it can't scale out to other cases, e.g. qinq, mac-in-mac, > >>> mpls > >>> l2/l3 tunnel. > >>> The parsing logic of "rte_pktmbuf_get_ptype()" and the definition of > >>> "struct rte_mbuf_hdr_lens" consist a pair for one specific parser scheme. > >>> In this case, the fixed function is to support below. > >>> > >>> + * Supported packet types are: > >>> + * L2: Ether > >>> + * L3: IPv4, IPv6 > >>> + * L4: TCP, UDP, SCTP > >>> > >>> Of course, it can add more packet type detection logic in future. > >>> But the more support, the higher the cost. > >>> > >>> One of the alternative way is to allow registering parser pair. APP > >>> decides to choose the predefined scheme(by DPDK LIB), or to > >>> self-define the parsing logic. > >>> In this way, the scheme can do some assumption for the specific case > >>> and ignore some useless graph detection. > >>> In addition, besides the SW parser, the HW parser(identified by > >>> packet_type in mbuf) can be turn on/off by leveraging the same manner. > >> > >> Sorry, I'm not sure I'm fully getting what you are saying. If I > >> understand well, you would like to have something more flexible that > >> supports the registration of protocol to be recognized? > >> > >> I'm not sure having a function with a dynamic registration method > >> would really increase the performance compared to a static complete > function. > >> Actually, we will never support a tons of protocols since each layer > >> packet type is 4 bits, and since it requires that at least one hw supports > >> it. > > > > This patch will be very useful as a reference implementation, but it also > highlights an issue with the current implementation of packet types reporting > by HW and SW - as you just mentioned there are only 4 bits per each layer. As > these 4 bit are used as a enumeration it is impossible to reports multiple > headers located on the same layer. MPLS is one example, different packets > could have different numbers of MPLS labels, but it is impossible to report > using current packet_type structure. > > > > It is possible, however, to program HW to report user (application) > > specific > packet types. For example, for IPoMPLS with one MPLS label, HW will report > packet type A, but for IPoMPLS with two MPLS labels HW will reports packet > type B. In this case, instead of defining and supporting tons of statically > defined > (or enumerated) protocol headers combinations, application will register > packet types it expects from HW in addition to standard packet types. At the > moment we have high bits of packet_type reserved, so one possible solution > would be to use the highest bit to indicate that this is user defined > packet_type, > specific to the application. Then it could be used with HW and with SW parser. > For example, packet_type 0x8000000A is IPoMPLS with one MPLS label, > 0x8000000B is IPoMPLS with two MPLS labels and so on. > > Thank you for the explanation. From your description, I wonder if the flow > director API recently [1] proposed by Adrien wouldn't solve this issue? > > [1] http://dpdk.org/ml/archives/dev/2016-July/043365.html
I'm reviewing Adrien's proposal (it is a big document :)) and reply with my comment after reviewing. Regards, Andrey > Regards, > Olivier