[dpdk-dev] [PATCH V2] net/bonding: fix lacp negotiation failed
From: Yicai Lu When two host is connected directly without any devices like switch, rx_machine_update would recieving partner lacp negotiation packets, and partner's port mac is filled with zeros in this packet, which is different with internal's mode4 mac. So in this situation, it would never go rx_machine branch and then execute mac swap for negotiation! Thus bond mode 4 will negotiation failed. Fixes: 56cbc0817399 ("net/bonding: fix LACP negotiation") Cc: sta...@dpdk.org Signed-off-by: luyicai --- v1 -> v2: Adjust commit info style --- drivers/net/bonding/rte_eth_bond_8023ad.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.c b/drivers/net/bonding/rte_eth_bond_8023ad.c index b77a37d..2002ec0 100644 --- a/drivers/net/bonding/rte_eth_bond_8023ad.c +++ b/drivers/net/bonding/rte_eth_bond_8023ad.c @@ -798,7 +798,8 @@ RTE_ASSERT(lacp->lacpdu.subtype == SLOW_SUBTYPE_LACP); partner = &lacp->lacpdu.partner; - if (rte_is_same_ether_addr(&partner->port_params.system, + if (rte_is_zero_ether_addr(&partner->port_params.system) || + rte_is_same_ether_addr(&partner->port_params.system, &internals->mode4.mac_addr)) { /* This LACP frame is sending to the bonding port * so pass it to rx_machine. -- 1.9.5.msysgit.1
[dpdk-dev] [PATCH] drivers/net/bonding: fix bug for lacp negotiation failed
When two host is connected directly without any devices like switch, and also enable dedicated tx/rx queues on bonding devices slaves, rx_machine_update would recieving partner lacp negotiation packets, which partner's port mac filled with zeros. So in this situation, it would never go rx_machine branch with correct mac! Thus bond mode 4 will negotiation failed. Signed-off-by: Lu Yicai --- drivers/net/bonding/rte_eth_bond_8023ad.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.c b/drivers/net/bonding/rte_eth_bond_8023ad.c index b77a37ddb..2002ec04a 100644 --- a/drivers/net/bonding/rte_eth_bond_8023ad.c +++ b/drivers/net/bonding/rte_eth_bond_8023ad.c @@ -798,7 +798,8 @@ rx_machine_update(struct bond_dev_private *internals, uint16_t slave_id, RTE_ASSERT(lacp->lacpdu.subtype == SLOW_SUBTYPE_LACP); partner = &lacp->lacpdu.partner; - if (rte_is_same_ether_addr(&partner->port_params.system, + if (rte_is_zero_ether_addr(&partner->port_params.system) || + rte_is_same_ether_addr(&partner->port_params.system, &internals->mode4.mac_addr)) { /* This LACP frame is sending to the bonding port * so pass it to rx_machine. -- 2.19.1
[dpdk-dev] [PATCH] drivers/net/bonding: fix bug for lacp negotiation failed
When two host is connected directly without any devices like switch, and also enable dedicated tx/rx queues on bonding devices slaves, rx_machine_update would recieving partner lacp negotiation packets, which partner's port mac filled with zeros. So in this situation, it would never go rx_machine branch with correct mac! Thus bond mode 4 will negotiation failed. Signed-off-by: luyicai --- drivers/net/bonding/rte_eth_bond_8023ad.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.c b/drivers/net/bonding/rte_eth_bond_8023ad.c index b77a37d..2002ec0 100644 --- a/drivers/net/bonding/rte_eth_bond_8023ad.c +++ b/drivers/net/bonding/rte_eth_bond_8023ad.c @@ -798,7 +798,8 @@ RTE_ASSERT(lacp->lacpdu.subtype == SLOW_SUBTYPE_LACP); partner = &lacp->lacpdu.partner; - if (rte_is_same_ether_addr(&partner->port_params.system, + if (rte_is_zero_ether_addr(&partner->port_params.system) || + rte_is_same_ether_addr(&partner->port_params.system, &internals->mode4.mac_addr)) { /* This LACP frame is sending to the bonding port * so pass it to rx_machine. -- 1.9.5.msysgit.1
[dpdk-dev] 答复: [PATCH v3] ip_frag: recalculate data length of fragment
|| | HOST | || | || |---| | | | ns2| | |---| | | | | || | | || || | | | | | tap1 | | | | tap2 | ns1 | tap3 | | | | | |mtu=1510| | | |mtu=1510| |mtu=1500| | | | |-|1.1.1.1 |-| |---|1.1.1.2 |-|2.1.1.1 |---| | | || || || | || | | | || | | | ||---| | | || | ||| | || bond | | ||mtu=1500|--| || In a complex OVS+DPDK scenario, this problem is found when fragmented packet aggregation is processed. Therefore, we simulated intermediate fragments by modifying the MTU on the host. To illustrate the problem, we simplify the packet format and ignore the impact of the packet header. In namespace2, a packet whose data length is 1520 is sent. When the packet passes tap2, the packet is divided into two fragments: fragment A and fragment B, similar to (1520 = 1510 + 10). When the packet passes tap3, the larger fragment packet A is divided into two fragments A1 and A2, similar to (1510 = 1500 + 10). Finally, the bond interface receives three fragments: A1, A2, and B (1520 = 1500 + 10 + 10). One fragmented packet A2 is smaller than the minimum Ethernet frame length, so it needs to be padded. A1: fa 16 3e 9f fb 82 fa 47 b2 57 dc 20 08 00 45 00 0010 05 dc b4 66 20 00 3f 01 9c b6 01 01 01 01 02 01 0020 01 02 08 00 2d f6 fa e0 00 0a 0e 3a b2 5f 00 00 0030 00 00 2f a9 06 00 00 00 00 00 10 11 12 13 14 15 0040 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25 ...... 05e0 b6 b7 b8 b9 ba bb bc bd be bf A2: fa 16 3e 9f fb 82 fa 47 b2 57 dc 20 08 00 45 00 0010 00 1c b4 66 20 b9 3f 01 a1 bd 01 01 01 01 02 01 0020 01 02 c0 c1 c2 c3 c4 c5 c6 c7 B: fa 16 3e 9f fb 82 fa 47 b2 57 dc 20 08 00 45 00 0010 00 33 b4 66 00 ba 3f 01 c1 a5 01 01 01 01 02 01 0020 01 02 c8 c9 ca cb cc cd ce cf d0 d1 d2 d3 d4 d5 0030 d6 d7 d8 d9 da db dc dd de df e0 e1 e2 e3 e4 e5 0040 e6 When processing the preceding packets above, DPDK would aggregate fragmented packets A2 and B. And error packets are generated, which padding(zero) is displayed in the middle of the packet. A2 + B: fa 16 3e 9f fb 82 fa 47 b2 57 dc 20 08 00 45 00 0010 00 33 b4 66 00 ba 3f 01 c1 a5 01 01 01 01 02 01 0020 01 02 c0 c1 c2 c3 c4 c5 c6 c7 00 00 00 00 00 00 0030 00 00 00 00 00 00 00 00 00 00 00 00 c8 c9 ca cb 0040 cc cd ce cf d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db 0050 dc dd de df e0 e1 e2 e3 e4 e5 e6 -邮件原件- 发件人: Aaron Conole [mailto:acon...@redhat.com] 发送时间: 2020年12月4日 0:14 收件人: luyicai 抄送: dev@dpdk.org; konstantin.anan...@intel.com; Zhoujingbin (Robin, Russell Lab) ; chenchanghu ; Lilijun (Jerry) ; Linhaifeng ; Guohongzhi (Russell Lab) ; wangyunjian ; sta...@dpdk.org 主题: Re: [dpdk-dev] [PATCH v3] ip_frag: recalculate data length of fragment Yicai Lu writes: > In some situations, we would get several ip fragments, which total > data length is less than minimum frame(64) and padding with zeros. > Examples: Second Fragment "a0a1 a2a3 a4a5 a6a7 ..." > and Third Fragment "a8a9 aaab acad aeaf b0b1 b2b3 ...". > Finally, we would reassemble Second and Third Fragment like this > "a0a1 a2a3 a4a5 a6a7 ... a8a9 aaab acad aeaf b0b1 ...", > which is not correct! > So, we need recalculate data length of fragment to remove padings! > > Fixes: 7f0983ee331c ("ip_frag: check fragment length of incoming > packet") > Cc: sta...@dpdk.org > > Signed-off-by: Yicai Lu > --- Sorry for coming late to the party. Are you saying that we have fragments which are less than min ip_len for anything other than the final fragment? Or the total data length after all fragments are reassembled is < min_ip_len ? like: frag1, len = 48 frag2, len = 10 something like that? Can you put some concrete examples in the commit message (or even better, include a test case in the ipv4_frag test suite that shows this)? > v2 -> v3: Update the comments. > --- > lib/librte_ip_frag/rte_ipv4_reassembly.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/lib/librte_ip_frag/rte_ipv4_reassembly.c
Re: [dpdk-dev] [PATCH v3] ip_frag: recalculate data length of fragment
Hi, Thank you so much for your review! I absolutely agree with that, and I'll resubmit a patch as soon as possible. > -Original Message- > From: Ananyev, Konstantin [mailto:konstantin.anan...@intel.com] > Sent: Monday, December 7, 2020 8:25 PM > To: luyicai ; dev@dpdk.org > Cc: Zhoujingbin (Robin, Russell Lab) ; > chenchanghu ; Lilijun (Jerry) > ; Linhaifeng ; > Guohongzhi (Russell Lab) ; wangyunjian > ; sta...@dpdk.org > Subject: RE: [PATCH v3] ip_frag: recalculate data length of fragment > Hi, > In some situations, we would get several ip fragments, which total > data length is less than minimum frame(64) and padding with zeros. > Examples: Second Fragment "a0a1 a2a3 a4a5 a6a7 ..." > and Third Fragment "a8a9 aaab acad aeaf b0b1 b2b3 ...". > Finally, we would reassemble Second and Third Fragment like this > "a0a1 a2a3 a4a5 a6a7 ... a8a9 aaab acad aeaf b0b1 ...", > which is not correct! > So, we need recalculate data length of fragment to remove padings! > > Fixes: 7f0983ee331c ("ip_frag: check fragment length of incoming > packet") > Cc: sta...@dpdk.org > > Signed-off-by: Yicai Lu > --- > v2 -> v3: Update the comments. > --- > lib/librte_ip_frag/rte_ipv4_reassembly.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/lib/librte_ip_frag/rte_ipv4_reassembly.c > b/lib/librte_ip_frag/rte_ipv4_reassembly.c > index 1dda8aca0..9a9fe3703 100644 > --- a/lib/librte_ip_frag/rte_ipv4_reassembly.c > +++ b/lib/librte_ip_frag/rte_ipv4_reassembly.c > @@ -117,6 +117,7 @@ rte_ipv4_frag_reassemble_packet(struct > rte_ip_frag_tbl *tbl, > > ip_ofs *= RTE_IPV4_HDR_OFFSET_UNITS; > ip_len = rte_be_to_cpu_16(ip_hdr->total_length) - mb->l3_len; > + mb->data_len = ip_len + mb->l3_len + mb->l2_len; > That doesn't look correct. > Even one fragment can consist of multiple segments. > Plus you don't update mb->pkt_len. > To do it properly, you'll need something like: > trim = mb->pkt_len - ip_len + mb->l3_len + mb->l2_len; > rte_pktmbuf_trim(mb, trim); > Though my preference would be to leave it as responsibility of the > caller (As it has to parse packet anyway to fill l2_len/l3_len and > usually strips > l2 headers, etc). > > IP_FRAG_LOG(DEBUG, "%s:%d:\n" > "mbuf: %p, tms: %" PRIu64 > -- > 2.28.0.windows.1
Re: [dpdk-dev] [PATCH v5] ip_frag: remove padding length of fragment
> -Original Message- > From: Ananyev, Konstantin [mailto:konstantin.anan...@intel.com] > Sent: Monday, December 14, 2020 10:45 PM > To: luyicai ; dev@dpdk.org > Cc: Zhoujingbin (Robin, Russell Lab) ; chenchanghu > ; Lilijun (Jerry) ; > Linhaifeng ; Guohongzhi (Russell Lab) > ; wangyunjian ; > sta...@dpdk.org > Subject: RE: [dpdk-dev] [PATCH v5] ip_frag: remove padding length of fragment > Hi Yicai, > In some situations, we would get several ip fragments, which total > data length is less than min_ip_len(64) and padding with zeros. > We simulated intermediate fragments by modifying the MTU. > To illustrate the problem, we simplify the packet format and ignore > the impact of the packet header.In namespace2, a packet whose data > length is 1520 is sent. > When the packet passes tap2, the packet is divided into two > fragments: fragment A and B, similar to (1520 = 1510 + 10). > When the packet passes tap3, the larger fragment packet A is divided > into two fragments A1 and A2, similar to (1510 = 1500 + 10). > Finally, the bond interface receives three fragments: > A1, A2, and B (1520 = 1500 + 10 + 10). > One fragmented packet A2 is smaller than the minimum Ethernet frame > length, so it needs to be padded. > > |---| > | HOST | > | |--| || | > | | ns2 | | |--| | | > | | || | | |||| | | > | | | tap1 | | | | tap2 | ns1| tap3 | | | > | | |mtu=1510| | | |mtu=1510||mtu=1500| | | > | |--|1.1.1.1 |--| |--|1.1.1.2 ||2.1.1.1 |--| | > ||| ||||| > | | | || > | |-| || > | || > | || | > | | bond | | > |--|mtu=1500|---| >|| > > When processing the preceding packets above, DPDK would aggregate > fragmented packets A2 and B. > And error packets are generated, which padding(zero) is displayed in > the middle of the packet. > > A2 + B: > fa 16 3e 9f fb 82 fa 47 b2 57 dc 20 08 00 45 00 > 0010 00 33 b4 66 00 ba 3f 01 c1 a5 01 01 01 01 02 01 > 0020 01 02 c0 c1 c2 c3 c4 c5 c6 c7 00 00 00 00 00 00 > 0030 00 00 00 00 00 00 00 00 00 00 00 00 c8 c9 ca cb > 0040 cc cd ce cf d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db > 0050 dc dd de df e0 e1 e2 e3 e4 e5 e6 > > So, we would calculate the length of padding, and remove the padding > in pkt_len and data_len before aggregation. > > Fixes: 7f0983ee331c ("ip_frag: check fragment length of incoming > packet") > Cc: sta...@dpdk.org > > Signed-off-by: Yicai Lu > --- > v4 -> v5: Update the comments and description. > --- > lib/librte_ip_frag/rte_ipv4_reassembly.c | 12 +--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/lib/librte_ip_frag/rte_ipv4_reassembly.c > b/lib/librte_ip_frag/rte_ipv4_reassembly.c > index 1dda8ac..fdf66a4 100644 > --- a/lib/librte_ip_frag/rte_ipv4_reassembly.c > +++ b/lib/librte_ip_frag/rte_ipv4_reassembly.c > @@ -104,6 +104,7 @@ struct rte_mbuf * > const unaligned_uint64_t *psd; > uint16_t flag_offset, ip_ofs, ip_flag; > int32_t ip_len; > + int32_t trim; > > flag_offset = rte_be_to_cpu_16(ip_hdr->fragment_offset); > ip_ofs = (uint16_t)(flag_offset & RTE_IPV4_HDR_OFFSET_MASK); @@ > -117,14 +118,15 @@ struct rte_mbuf * > > ip_ofs *= RTE_IPV4_HDR_OFFSET_UNITS; > ip_len = rte_be_to_cpu_16(ip_hdr->total_length) - mb->l3_len; > + trim = mb->pkt_len - (ip_len + mb->l3_len + mb->l2_len); > > IP_FRAG_LOG(DEBUG, "%s:%d:\n" > - "mbuf: %p, tms: %" PRIu64 > - ", key: <%" PRIx64 ", %#x>, ofs: %u, len: %d, flags: %#x\n" > + "mbuf: %p, tms: %" PRIu64 ", key: <%" PRIx64 ", %#x>" > + "ofs: %u, len: %d, padding: %d, flags: %#x\n" > "tbl: %p, max_cycles: %" PRIu64 ", entry_mask: %#x, " > "max_entries: %u, use_entries: %u\n\n", > __func__, __LINE__, > - mb, tms, key.src_dst[0], key.id, ip_ofs, ip_len, ip_flag, > + mb, tms, key.src_dst[0], key.id, ip_ofs, ip_len, trim, ip_flag, > tbl, tbl->max_cycles, tb
Re: [dpdk-dev] [PATCH v5] ip_frag: remove padding length of fragment
-Original Message- From: Ananyev, Konstantin [mailto:konstantin.anan...@intel.com] Sent: Wednesday, December 16, 2020 6:45 PM To: luyicai ; dev@dpdk.org Cc: Zhoujingbin (Robin, Russell Lab) ; chenchanghu ; Lilijun (Jerry) ; Linhaifeng ; Guohongzhi (Russell Lab) ; wangyunjian ; sta...@dpdk.org Subject: RE: [dpdk-dev] [PATCH v5] ip_frag: remove padding length of fragment > Hi Yicai, > > In some situations, we would get several ip fragments, which total > > data length is less than min_ip_len(64) and padding with zeros. > > We simulated intermediate fragments by modifying the MTU. > > To illustrate the problem, we simplify the packet format and ignore > > the impact of the packet header.In namespace2, a packet whose data > > length is 1520 is sent. > > When the packet passes tap2, the packet is divided into two > > fragments: fragment A and B, similar to (1520 = 1510 + 10). > > When the packet passes tap3, the larger fragment packet A is divided > > into two fragments A1 and A2, similar to (1510 = 1500 + 10). > > Finally, the bond interface receives three fragments: > > A1, A2, and B (1520 = 1500 + 10 + 10). > > One fragmented packet A2 is smaller than the minimum Ethernet frame > > length, so it needs to be padded. > > > > |---| > > | HOST | > > | |--| || | > > | | ns2 | | |--| | | > > | | || | | |||| | | > > | | | tap1 | | | | tap2 | ns1| tap3 | | | > > | | |mtu=1510| | | |mtu=1510||mtu=1500| | | > > | |--|1.1.1.1 |--| |--|1.1.1.2 ||2.1.1.1 |--| | > > ||| ||||| > > | | | || > > | |-| || > > | || > > | || | > > | | bond | | > > |--|mtu=1500|---| > >|| > > > > When processing the preceding packets above, DPDK would aggregate > > fragmented packets A2 and B. > > And error packets are generated, which padding(zero) is displayed in > > the middle of the packet. > > > > A2 + B: > > fa 16 3e 9f fb 82 fa 47 b2 57 dc 20 08 00 45 00 > > 0010 00 33 b4 66 00 ba 3f 01 c1 a5 01 01 01 01 02 01 > > 0020 01 02 c0 c1 c2 c3 c4 c5 c6 c7 00 00 00 00 00 00 > > 0030 00 00 00 00 00 00 00 00 00 00 00 00 c8 c9 ca cb > > 0040 cc cd ce cf d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db > > 0050 dc dd de df e0 e1 e2 e3 e4 e5 e6 > > > > So, we would calculate the length of padding, and remove the padding > > in pkt_len and data_len before aggregation. > > > > Fixes: 7f0983ee331c ("ip_frag: check fragment length of incoming > > packet") > > Cc: sta...@dpdk.org > > > > Signed-off-by: Yicai Lu > > --- > > v4 -> v5: Update the comments and description. > > --- > > lib/librte_ip_frag/rte_ipv4_reassembly.c | 12 +--- > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > diff --git a/lib/librte_ip_frag/rte_ipv4_reassembly.c > > b/lib/librte_ip_frag/rte_ipv4_reassembly.c > > index 1dda8ac..fdf66a4 100644 > > --- a/lib/librte_ip_frag/rte_ipv4_reassembly.c > > +++ b/lib/librte_ip_frag/rte_ipv4_reassembly.c > > @@ -104,6 +104,7 @@ struct rte_mbuf * > > const unaligned_uint64_t *psd; > > uint16_t flag_offset, ip_ofs, ip_flag; > > int32_t ip_len; > > + int32_t trim; > > > > flag_offset = rte_be_to_cpu_16(ip_hdr->fragment_offset); > > ip_ofs = (uint16_t)(flag_offset & RTE_IPV4_HDR_OFFSET_MASK); @@ > > -117,14 +118,15 @@ struct rte_mbuf * > > > > ip_ofs *= RTE_IPV4_HDR_OFFSET_UNITS; > > ip_len = rte_be_to_cpu_16(ip_hdr->total_length) - mb->l3_len; > > + trim = mb->pkt_len - (ip_len + mb->l3_len + mb->l2_len); > > > > IP_FRAG_LOG(DEBUG, "%s:%d:\n" > > - "mbuf: %p, tms: %" PRIu64 > > - ", key: <%" PRIx64 ", %#x>, ofs: %u, len: %d, flags: %#x\n" > > + "mbuf: %p, tms: %" PRIu64 ", key: <%" PRIx64 ", %#x>" > > + "ofs: %u, len: %d, padding: %d, flags: %#x\n" > > "tbl: %p, max_cycles: %" PRIu64 ", entry_mask: %#