Hi Kumara, Please see replies inline.
Thanks, Jiayu > -----Original Message----- > From: Kumara Parameshwaran <kumaraparames...@gmail.com> > Sent: Tuesday, November 1, 2022 3:06 PM > To: Hu, Jiayu <jiayu...@intel.com> > Cc: dev@dpdk.org; Kumara Parameshwaran > <kumaraparames...@gmail.com>; Kumara Parameshwaran > <kparamesh...@vmware.com> > Subject: [PATCH v5] gro : fix reordering of packets in GRO library > > From: Kumara Parameshwaran <kumaraparames...@gmail.com> > > When a TCP packet contains flags like PSH it is returned immediately to the > application though there might be packets of the same flow in the GRO table. > If PSH flag is set on a segment packets up to the segment should be delivered > immediately. But the current implementation delivers the last arrived packet > with PSH flag set causing re-ordering > > With this patch, if a packet does not contain only ACK flag and if there are > no > previous packets for the flow the packet would be returned immediately, > else will be merged with the previous segment and the flag on the last > segment will be set on the entire segment. > This is the behaviour with linux stack as well. > > Signed-off-by: Kumara Parameshwaran <kumaraparames...@gmail.com> > Co-authored-by: Kumara Parameshwaran <kparamesh...@vmware.com> > --- > v1: > If the received packet is not a pure ACK packet, we check if > there are any previous packets in the flow, if present we indulge > the received packet also in the coalescing logic and update the flags > of the last recived packet to the entire segment which would avoid > re-ordering. > > Lets say a case where P1(PSH), P2(ACK), P3(ACK) are received in > burst mode, > P1 contains PSH flag and since it does not contain any prior packets in > the flow > we copy it to unprocess_packets and P2(ACK) and P3(ACK) are > merged together. > In the existing case the P2,P3 would be delivered as single segment > first and the > unprocess_packets will be copied later which will cause reordering. > With the patch > copy the unprocess packets first and then the packets from the GRO > table. > > Testing done > The csum test-pmd was modifited to support the following > GET request of 10MB from client to server via test-pmd (static arp > entries added in client > and server). Enable GRO and TSO in test-pmd where the packets > recived from the client mac > would be sent to server mac and vice versa. > In above testing, without the patch the client observerd re-ordering > of 25 packets > and with the patch there were no packet re-ordering observerd. > > v2: > Fix warnings in commit and comment. > Do not consider packet as candidate to merge if it contains SYN/RST > flag. > > v3: > Fix warnings. > > v4: > Rebase with master. > > v5: > Adding co-author email > > lib/gro/gro_tcp4.c | 45 +++++++++++++++++++++++++++++++++++++-------- > lib/gro/rte_gro.c | 18 +++++++++--------- > 2 files changed, 46 insertions(+), 17 deletions(-) > > diff --git a/lib/gro/gro_tcp4.c b/lib/gro/gro_tcp4.c index > 0014096e63..7363c5d540 100644 > --- a/lib/gro/gro_tcp4.c > +++ b/lib/gro/gro_tcp4.c > @@ -188,6 +188,19 @@ update_header(struct gro_tcp4_item *item) > pkt->l2_len); > } > > +static inline void > +update_tcp_hdr_flags(struct rte_tcp_hdr *tcp_hdr, struct rte_mbuf *pkt) > +{ > + struct rte_ether_hdr *eth_hdr; > + struct rte_ipv4_hdr *ipv4_hdr; > + struct rte_tcp_hdr *merged_tcp_hdr; > + > + eth_hdr = rte_pktmbuf_mtod(pkt, struct rte_ether_hdr *); > + ipv4_hdr = (struct rte_ipv4_hdr *)((char *)eth_hdr + pkt->l2_len); > + merged_tcp_hdr = (struct rte_tcp_hdr *)((char *)ipv4_hdr + pkt- > >l3_len); > + merged_tcp_hdr->tcp_flags |= tcp_hdr->tcp_flags; } The Linux kernel updates the TCP flag via "tcp_flag_word(th2) |= flags & (TCP_FLAG_FIN | TCP_FLAG_PSH)", which only adds FIN and PSH at most to the merge packet. > + > int32_t > gro_tcp4_reassemble(struct rte_mbuf *pkt, > struct gro_tcp4_tbl *tbl, > @@ -206,6 +219,7 @@ gro_tcp4_reassemble(struct rte_mbuf *pkt, > uint32_t i, max_flow_num, remaining_flow_num; > int cmp; > uint8_t find; > + uint32_t start_idx; > > /* > * Don't process the packet whose TCP header length is greater @@ - > 219,13 +233,6 @@ gro_tcp4_reassemble(struct rte_mbuf *pkt, > tcp_hdr = (struct rte_tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len); > hdr_len = pkt->l2_len + pkt->l3_len + pkt->l4_len; > > - /* > - * Don't process the packet which has FIN, SYN, RST, PSH, URG, ECE > - * or CWR set. > - */ > - if (tcp_hdr->tcp_flags != RTE_TCP_ACK_FLAG) > - return -1; > - > /* trim the tail padding bytes */ > ip_tlen = rte_be_to_cpu_16(ipv4_hdr->total_length); > if (pkt->pkt_len > (uint32_t)(ip_tlen + pkt->l2_len)) @@ -264,12 > +271,30 @@ gro_tcp4_reassemble(struct rte_mbuf *pkt, > if (tbl->flows[i].start_index != INVALID_ARRAY_INDEX) { > if (is_same_tcp4_flow(tbl->flows[i].key, key)) { > find = 1; > + start_idx = tbl->flows[i].start_index; > break; > } > remaining_flow_num--; > } > } > > + if (tcp_hdr->tcp_flags != RTE_TCP_ACK_FLAG) { > + /* > + * Check and try merging the current TCP segment with the > previous > + * TCP segment if the TCP header does not contain RST and > SYN flag > + * There are cases where the last segment is sent with > FIN|PSH|ACK > + * which should also be considered for merging with previous > segments. > + */ > + if (find && !(tcp_hdr->tcp_flags & > (RTE_TCP_RST_FLAG|RTE_TCP_SYN_FLAG))) > + /* > + * Since PSH flag is set, start time will be set to 0 > so it > will be flushed > + * immediately. > + */ > + tbl->items[start_idx].start_time = 0; > + else > + return -1; > + } The nested if-else check is not straightforward, and it's hard to read the condition-action of different combinations of flag bits. In addition, are all flag bits considered like Linux kernel? > + > /* > * Fail to find a matched flow. Insert a new flow and store the > * packet into the flow. > @@ -304,8 +329,12 @@ gro_tcp4_reassemble(struct rte_mbuf *pkt, > is_atomic); > if (cmp) { > if (merge_two_tcp4_packets(&(tbl->items[cur_idx]), > - pkt, cmp, sent_seq, ip_id, 0)) > + pkt, cmp, sent_seq, ip_id, 0)) > { > + if (tbl->items[cur_idx].start_time == 0) > + update_tcp_hdr_flags(tcp_hdr, tbl- > >items[cur_idx].firstseg); > return 1; > + } > + > /* > * Fail to merge the two packets, as the packet > * length is greater than the max value. Store diff > --git > a/lib/gro/rte_gro.c b/lib/gro/rte_gro.c index e35399fd42..87c5502dce > 100644 > --- a/lib/gro/rte_gro.c > +++ b/lib/gro/rte_gro.c > @@ -283,10 +283,17 @@ rte_gro_reassemble_burst(struct rte_mbuf **pkts, > if ((nb_after_gro < nb_pkts) > || (unprocess_num < nb_pkts)) { > i = 0; > + /* Copy unprocessed packets */ > + if (unprocess_num > 0) { > + memcpy(&pkts[i], unprocess_pkts, > + sizeof(struct rte_mbuf *) * > + unprocess_num); > + i = unprocess_num; > + } Why copy unprocess pkts first? This is for avoiding out-of-order? Thanks, Jiayu > /* Flush all packets from the tables */ > if (do_vxlan_tcp_gro) { > - i = gro_vxlan_tcp4_tbl_timeout_flush(&vxlan_tcp_tbl, > - 0, pkts, nb_pkts); > + i += > gro_vxlan_tcp4_tbl_timeout_flush(&vxlan_tcp_tbl, > + 0, &pkts[i], nb_pkts - i); > } > > if (do_vxlan_udp_gro) { > @@ -304,13 +311,6 @@ rte_gro_reassemble_burst(struct rte_mbuf **pkts, > i += gro_udp4_tbl_timeout_flush(&udp_tbl, 0, > &pkts[i], nb_pkts - i); > } > - /* Copy unprocessed packets */ > - if (unprocess_num > 0) { > - memcpy(&pkts[i], unprocess_pkts, > - sizeof(struct rte_mbuf *) * > - unprocess_num); > - } > - nb_after_gro = i + unprocess_num; > } > > return nb_after_gro; > -- > 2.25.1