-----Original Message----- From: Ananyev, Konstantin [mailto:konstantin.anan...@intel.com] Sent: Wednesday, December 16, 2020 6:45 PM To: luyicai <luyi...@huawei.com>; dev@dpdk.org Cc: Zhoujingbin (Robin, Russell Lab) <zhoujing...@huawei.com>; chenchanghu <chenchan...@huawei.com>; Lilijun (Jerry) <jerry.lili...@huawei.com>; Linhaifeng <haifeng....@huawei.com>; Guohongzhi (Russell Lab) <guohongz...@huawei.com>; wangyunjian <wangyunj...@huawei.com>; sta...@dpdk.org Subject: RE: [dpdk-dev] [PATCH v5] ip_frag: remove padding length of fragment
> Hi Yicai, > > In some situations, we would get several ip fragments, which total > > data length is less than min_ip_len(64) and padding with zeros. > > We simulated intermediate fragments by modifying the MTU. > > To illustrate the problem, we simplify the packet format and ignore > > the impact of the packet header.In namespace2, a packet whose data > > length is 1520 is sent. > > When the packet passes tap2, the packet is divided into two > > fragments: fragment A and B, similar to (1520 = 1510 + 10). > > When the packet passes tap3, the larger fragment packet A is divided > > into two fragments A1 and A2, similar to (1510 = 1500 + 10). > > Finally, the bond interface receives three fragments: > > A1, A2, and B (1520 = 1500 + 10 + 10). > > One fragmented packet A2 is smaller than the minimum Ethernet frame > > length, so it needs to be padded. > > > > |---------------------------------------------------| > > | HOST | > > | |--------------| |----------------------------| | > > | | ns2 | | |--------------| | | > > | | |--------| | | |--------| |--------| | | > > | | | tap1 | | | | tap2 | ns1| tap3 | | | > > | | |mtu=1510| | | |mtu=1510| |mtu=1500| | | > > | |--|1.1.1.1 |--| |--|1.1.1.2 |----|2.1.1.1 |--| | > > | |--------| |--------| |--------| | > > | | | | | > > | |-----------------| | | > > | | | > > | |--------| | > > | | bond | | > > |--------------------------------------|mtu=1500|---| > > |--------| > > > > When processing the preceding packets above, DPDK would aggregate > > fragmented packets A2 and B. > > And error packets are generated, which padding(zero) is displayed in > > the middle of the packet. > > > > A2 + B: > > 0000 fa 16 3e 9f fb 82 fa 47 b2 57 dc 20 08 00 45 00 > > 0010 00 33 b4 66 00 ba 3f 01 c1 a5 01 01 01 01 02 01 > > 0020 01 02 c0 c1 c2 c3 c4 c5 c6 c7 00 00 00 00 00 00 > > 0030 00 00 00 00 00 00 00 00 00 00 00 00 c8 c9 ca cb > > 0040 cc cd ce cf d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db > > 0050 dc dd de df e0 e1 e2 e3 e4 e5 e6 > > > > So, we would calculate the length of padding, and remove the padding > > in pkt_len and data_len before aggregation. > > > > Fixes: 7f0983ee331c ("ip_frag: check fragment length of incoming > > packet") > > Cc: sta...@dpdk.org > > > > Signed-off-by: Yicai Lu <luyi...@huawei.com> > > --- > > v4 -> v5: Update the comments and description. > > --- > > lib/librte_ip_frag/rte_ipv4_reassembly.c | 12 +++++++++--- > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > diff --git a/lib/librte_ip_frag/rte_ipv4_reassembly.c > > b/lib/librte_ip_frag/rte_ipv4_reassembly.c > > index 1dda8ac..fdf66a4 100644 > > --- a/lib/librte_ip_frag/rte_ipv4_reassembly.c > > +++ b/lib/librte_ip_frag/rte_ipv4_reassembly.c > > @@ -104,6 +104,7 @@ struct rte_mbuf * > > const unaligned_uint64_t *psd; > > uint16_t flag_offset, ip_ofs, ip_flag; > > int32_t ip_len; > > + int32_t trim; > > > > flag_offset = rte_be_to_cpu_16(ip_hdr->fragment_offset); > > ip_ofs = (uint16_t)(flag_offset & RTE_IPV4_HDR_OFFSET_MASK); @@ > > -117,14 +118,15 @@ struct rte_mbuf * > > > > ip_ofs *= RTE_IPV4_HDR_OFFSET_UNITS; > > ip_len = rte_be_to_cpu_16(ip_hdr->total_length) - mb->l3_len; > > + trim = mb->pkt_len - (ip_len + mb->l3_len + mb->l2_len); > > > > IP_FRAG_LOG(DEBUG, "%s:%d:\n" > > - "mbuf: %p, tms: %" PRIu64 > > - ", key: <%" PRIx64 ", %#x>, ofs: %u, len: %d, flags: %#x\n" > > + "mbuf: %p, tms: %" PRIu64 ", key: <%" PRIx64 ", %#x>" > > + "ofs: %u, len: %d, padding: %d, flags: %#x\n" > > "tbl: %p, max_cycles: %" PRIu64 ", entry_mask: %#x, " > > "max_entries: %u, use_entries: %u\n\n", > > __func__, __LINE__, > > - mb, tms, key.src_dst[0], key.id, ip_ofs, ip_len, ip_flag, > > + mb, tms, key.src_dst[0], key.id, ip_ofs, ip_len, trim, ip_flag, > > tbl, tbl->max_cycles, tbl->entry_mask, tbl->max_entries, > > tbl->use_entries); > > > > @@ -134,6 +136,10 @@ struct rte_mbuf * > > return NULL; > > } > > > > + if (unlikely(trim > 0)) { > > + rte_pktmbuf_trim(mb, trim); > > + } > > > As a nit {} braces are not required for single expression. > > LGTM in general, just one thing: shouldn't we have the same fix for ipv6 > > then? > > Konstantin > > Hi Konstantin, > > Thanks! > > During the problem analysis, we have discussed on ipv6 and concluded > that it does not exist in ipv6. > > For ipv6, it consists of the following parts: > basic header = 40(bytes) > DMAC = 6(bytes) > SMAC = 6(bytes) > Type = 2(bytes) > CRC = 4(bytes) > fragment header = 8(bytes) > ... > > 40 + 6 + 6 + 2 + 4 + 8 = 66 (bytes) > > Total is already greater than min_ip_len(64). So it doesn't need to be > padded with zeros. > For normal cases - yes, but in theory there could be some unusual scenarios > (tunnelled packet, different media, etc.). > So for consistency and to avoid unforeseen issues - I think better to have > the fix for both ipv4 and ipv6. > After all the impact looks neglectable. > Konstantin Hi Konstantin, Agree! In terms of code symmetry, it should be better. Whatever, I'll submit an another patch(v6) later. > > > + > > /* try to find/add entry into the fragment's table. */ > > if ((fp = ip_frag_find(tbl, dr, &key, tms)) == NULL) { > > IP_FRAG_MBUF2DR(dr, mb); > > -- > > 1.9.5.msysgit.1