Hi Konstantin, Thanks for your important suggestions. My feedbacks are inline.
On Wed, Aug 30, 2017 at 09:38:33AM +0800, Ananyev, Konstantin wrote: > > > > -----Original Message----- > > From: Hu, Jiayu > > Sent: Thursday, August 24, 2017 3:16 PM > > To: dev@dpdk.org > > Cc: Kavanagh, Mark B <mark.b.kavan...@intel.com>; Ananyev, Konstantin > > <konstantin.anan...@intel.com>; Tan, Jianfeng > > <jianfeng....@intel.com>; Hu, Jiayu <jiayu...@intel.com> > > Subject: [PATCH 2/5] gso/lib: add TCP/IPv4 GSO support > > > > This patch adds GSO support for TCP/IPv4 packets. Supported packets > > may include a single VLAN tag. TCP/IPv4 GSO assumes that all input > > packets have correct checksums, and doesn't update checksums for output > > packets (the responsibility for this lies with the application). > > Additionally, TCP/IPv4 GSO doesn't process IP fragmented packets. > > > > TCP/IPv4 GSO uses two chained MBUFs, one direct MBUF and one indrect > > MBUF, to organize an output packet. Note that we refer to these two > > chained MBUFs as a two-segment MBUF. The direct MBUF stores the packet > > header, while the indirect mbuf simply points to a location within the > > original packet's payload. Consequently, use of the GSO library requires > > multi-segment MBUF support in the TX functions of the NIC driver. > > > > If a packet is GSOed, TCP/IPv4 GSO reduces its MBUF refcnt by 1. As a > > result, when all of its GSOed segments are freed, the packet is freed > > automatically. > > > > Signed-off-by: Jiayu Hu <jiayu...@intel.com> > > Signed-off-by: Mark Kavanagh <mark.b.kavan...@intel.com> > > --- > > lib/librte_eal/common/include/rte_log.h | 1 + > > lib/librte_gso/Makefile | 2 + > > lib/librte_gso/gso_common.c | 270 > > ++++++++++++++++++++++++++++++++ > > lib/librte_gso/gso_common.h | 120 ++++++++++++++ > > lib/librte_gso/gso_tcp.c | 82 ++++++++++ > > lib/librte_gso/gso_tcp.h | 73 +++++++++ > > lib/librte_gso/rte_gso.c | 44 +++++- > > lib/librte_gso/rte_gso.h | 3 + > > 8 files changed, 593 insertions(+), 2 deletions(-) > > create mode 100644 lib/librte_gso/gso_common.c > > create mode 100644 lib/librte_gso/gso_common.h > > create mode 100644 lib/librte_gso/gso_tcp.c > > create mode 100644 lib/librte_gso/gso_tcp.h > > > > diff --git a/lib/librte_eal/common/include/rte_log.h > > b/lib/librte_eal/common/include/rte_log.h > > index ec8dba7..2fa1199 100644 > > --- a/lib/librte_eal/common/include/rte_log.h > > +++ b/lib/librte_eal/common/include/rte_log.h > > @@ -87,6 +87,7 @@ extern struct rte_logs rte_logs; > > #define RTE_LOGTYPE_CRYPTODEV 17 /**< Log related to cryptodev. */ > > #define RTE_LOGTYPE_EFD 18 /**< Log related to EFD. */ > > #define RTE_LOGTYPE_EVENTDEV 19 /**< Log related to eventdev. */ > > +#define RTE_LOGTYPE_GSO 20 /**< Log related to GSO. */ > > > > /* these log types can be used in an application */ > > #define RTE_LOGTYPE_USER1 24 /**< User-defined log type 1. */ > > diff --git a/lib/librte_gso/Makefile b/lib/librte_gso/Makefile > > index aeaacbc..0f8e38f 100644 > > --- a/lib/librte_gso/Makefile > > +++ b/lib/librte_gso/Makefile > > @@ -42,6 +42,8 @@ LIBABIVER := 1 > > > > #source files > > SRCS-$(CONFIG_RTE_LIBRTE_GSO) += rte_gso.c > > +SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_common.c > > +SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_tcp.c > > > > # install this header file > > SYMLINK-$(CONFIG_RTE_LIBRTE_GSO)-include += rte_gso.h > > diff --git a/lib/librte_gso/gso_common.c b/lib/librte_gso/gso_common.c > > new file mode 100644 > > index 0000000..2b54fbd > > --- /dev/null > > +++ b/lib/librte_gso/gso_common.c > > @@ -0,0 +1,270 @@ > > +/*- > > + * BSD LICENSE > > + * > > + * Copyright(c) 2017 Intel Corporation. All rights reserved. > > + * All rights reserved. > > + * > > + * Redistribution and use in source and binary forms, with or without > > + * modification, are permitted provided that the following conditions > > + * are met: > > + * > > + * * Redistributions of source code must retain the above copyright > > + * notice, this list of conditions and the following disclaimer. > > + * * Redistributions in binary form must reproduce the above copyright > > + * notice, this list of conditions and the following disclaimer in > > + * the documentation and/or other materials provided with the > > + * distribution. > > + * * Neither the name of Intel Corporation nor the names of its > > + * contributors may be used to endorse or promote products derived > > + * from this software without specific prior written permission. > > + * > > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > > + */ > > + > > +#include <stdbool.h> > > +#include <string.h> > > + > > +#include <rte_malloc.h> > > + > > +#include <rte_ether.h> > > +#include <rte_ip.h> > > +#include <rte_tcp.h> > > + > > +#include "gso_common.h" > > + > > +static inline void > > +hdr_segment_init(struct rte_mbuf *hdr_segment, struct rte_mbuf *pkt, > > + uint16_t pkt_hdr_offset) > > +{ > > + /* copy mbuf metadata */ > > + hdr_segment->nb_segs = 1; > > + hdr_segment->port = pkt->port; > > + hdr_segment->ol_flags = pkt->ol_flags; > > + hdr_segment->packet_type = pkt->packet_type; > > + hdr_segment->pkt_len = pkt_hdr_offset; > > + hdr_segment->data_len = pkt_hdr_offset; > > + hdr_segment->tx_offload = pkt->tx_offload; > > + /* copy packet header */ > > + rte_memcpy(rte_pktmbuf_mtod(hdr_segment, char *), > > + rte_pktmbuf_mtod(pkt, char *), > > + pkt_hdr_offset); > > +} > > + > > +static inline void > > +free_gso_segment(struct rte_mbuf **pkts, uint16_t nb_pkts) > > +{ > > + uint16_t i; > > + > > + for (i = 0; i < nb_pkts; i++) { > > + rte_pktmbuf_detach(pkts[i]->next); > > I don't think you need to call detach() here explicitly. > Just rte_pktmbuf_free(pkts[i]) should do I think. Yes, rte_pktmbuf_free() is enough. I will modify it. Thanks. > > > + rte_pktmbuf_free(pkts[i]); > > + pkts[i] = NULL; > > + } > > +} > > + > > +int > > +gso_do_segment(struct rte_mbuf *pkt, > > + uint16_t pkt_hdr_offset, > > + uint16_t pyld_unit_size, > > + struct rte_mempool *direct_pool, > > + struct rte_mempool *indirect_pool, > > + struct rte_mbuf **pkts_out, > > + uint16_t nb_pkts_out) > > +{ > > + struct rte_mbuf *pkt_in; > > + struct rte_mbuf *hdr_segment, *pyld_segment; > > + uint32_t pkt_in_pyld_off; > > + uint16_t pkt_in_segment_len, pkt_out_segment_len; > > + uint16_t nb_segs; > > + bool pkt_in_segment_processed; > > + > > + pkt_in_pyld_off = pkt->data_off + pkt_hdr_offset; > > + pkt_in = pkt; > > + nb_segs = 0; > > + > > + while (pkt_in) { > > + pkt_in_segment_processed = false; > > + pkt_in_segment_len = pkt_in->data_off + pkt_in->data_len; > > + > > + while (!pkt_in_segment_processed) { > > + if (unlikely(nb_segs >= nb_pkts_out)) { > > + free_gso_segment(pkts_out, nb_segs); > > + return -EINVAL; > > + } > > + > > + /* allocate direct mbuf */ > > + hdr_segment = rte_pktmbuf_alloc(direct_pool); > > + if (unlikely(hdr_segment == NULL)) { > > + free_gso_segment(pkts_out, nb_segs); > > + return -ENOMEM; > > + } > > + > > + /* allocate indirect mbuf */ > > + pyld_segment = rte_pktmbuf_alloc(indirect_pool); > > + if (unlikely(pyld_segment == NULL)) { > > + rte_pktmbuf_free(hdr_segment); > > + free_gso_segment(pkts_out, nb_segs); > > + return -ENOMEM; > > + } > > So, if I understand correctly each new packet would always contain just one > data segment? > Why several data segments couldn't be chained together (if sum of their > data_len <= mss)? > In a same way as done here: > http://dpdk.org/browse/dpdk/tree/lib/librte_ip_frag/rte_ipv4_fragmentation.c#n93 > or here: > https://gerrit.fd.io/r/gitweb?p=tldk.git;a=blob;f=lib/libtle_l4p/tcp_tx_seg.h;h=a8d2425597a7ad6f598aa4bb7fcd7f1da74305f0;hb=HEAD#l23 > ? Oh, yes. I can chain these data segments when their total length is less than the GSO segsz. I will change it in the next patch. Thanks very much. > > > + > > + /* copy packet header */ > > + hdr_segment_init(hdr_segment, pkt, pkt_hdr_offset); > > + > > + /* attach payload mbuf to current packet segment */ > > + rte_pktmbuf_attach(pyld_segment, pkt_in); > > + > > + hdr_segment->next = pyld_segment; > > + pkts_out[nb_segs++] = hdr_segment; > > + > > + /* calculate payload length */ > > + pkt_out_segment_len = pyld_unit_size; > > + if (pkt_in_pyld_off + pkt_out_segment_len > > > + pkt_in_segment_len) { > > + pkt_out_segment_len = pkt_in_segment_len - > > + pkt_in_pyld_off; > > + } > > + > > + /* update payload segment */ > > + pyld_segment->data_off = pkt_in_pyld_off; > > + pyld_segment->data_len = pkt_out_segment_len; > > + > > + /* update header segment */ > > + hdr_segment->pkt_len += pyld_segment->data_len; > > + hdr_segment->nb_segs++; > > + > > + /* update pkt_in_pyld_off */ > > + pkt_in_pyld_off += pkt_out_segment_len; > > + if (pkt_in_pyld_off == pkt_in_segment_len) > > + pkt_in_segment_processed = true; > > + } > > + > > + /* 'pkt_in' may contain numerous segments */ > > + pkt_in = pkt_in->next; > > + if (pkt_in != NULL) > > + pkt_in_pyld_off = pkt_in->data_off; > > + } > > + return nb_segs; > > +} > > + > > +static inline void > > +parse_ipv4(struct ipv4_hdr *ipv4_hdr, struct rte_mbuf *pkt) > > +{ > > + struct tcp_hdr *tcp_hdr; > > + > > + switch (ipv4_hdr->next_proto_id) { > > + case IPPROTO_TCP: > > + pkt->packet_type |= RTE_PTYPE_L4_TCP; > > + pkt->l3_len = IPv4_HDR_LEN(ipv4_hdr); > > + tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len); > > + pkt->l4_len = TCP_HDR_LEN(tcp_hdr); > > + break; > > + } > > +} > > + > > +static inline void > > +parse_ethernet(struct ether_hdr *eth_hdr, struct rte_mbuf *pkt) > > +{ > > + struct ipv4_hdr *ipv4_hdr; > > + struct vlan_hdr *vlan_hdr; > > + uint16_t ethertype; > > + > > + ethertype = rte_be_to_cpu_16(eth_hdr->ether_type); > > + if (ethertype == ETHER_TYPE_VLAN) { > > + vlan_hdr = (struct vlan_hdr *)(eth_hdr + 1); > > + pkt->l2_len = sizeof(struct vlan_hdr); > > + pkt->packet_type |= RTE_PTYPE_L2_ETHER_VLAN; > > + ethertype = rte_be_to_cpu_16(vlan_hdr->eth_proto); > > + } > > + > > + switch (ethertype) { > > + case ETHER_TYPE_IPv4: > > + if (IS_VLAN_PKT(pkt)) { > > + pkt->packet_type |= RTE_PTYPE_L3_IPV4; > > + } else { > > + pkt->packet_type |= RTE_PTYPE_L2_ETHER; > > + pkt->packet_type |= RTE_PTYPE_L3_IPV4; > > + } > > + pkt->l2_len += sizeof(struct ether_hdr); > > + ipv4_hdr = (struct ipv4_hdr *) ((char *)eth_hdr + > > + pkt->l2_len); > > + parse_ipv4(ipv4_hdr, pkt); > > + break; > > + } > > +} > > + > > +void > > +gso_parse_packet(struct rte_mbuf *pkt) > > There is a function rte_net_get_ptype() that supposed to provide similar > functionality. > So we probably don't need to create a new SW parse function here, instead > would be better > to reuse (and update if needed) an existing one. > Again user already might have l2/l3/l4.../_len and packet_type setuped. > So better to keep SW packet parsing out of scope of that library. Hmm, I know we have discussed this design choice in the GRO library, and I also think it's better to reuse these values. But from the perspective of OVS, it may add extra overhead, since OVS doesn't parse every packet originally. Maybe @Mark can give us more inputs from the view of OVS. > > > +{ > > + struct ether_hdr *eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *); > > + > > + pkt->packet_type = pkt->tx_offload = 0; > > + parse_ethernet(eth_hdr, pkt); > > +} > > + > > +static inline void > > +update_ipv4_header(char *base, uint16_t offset, uint16_t length, uint16_t > > id) > > +{ > > + struct ipv4_hdr *ipv4_hdr = (struct ipv4_hdr *)(base + offset); > > + > > + ipv4_hdr->total_length = rte_cpu_to_be_16(length - offset); > > + ipv4_hdr->packet_id = rte_cpu_to_be_16(id); > > +} > > + > > +static inline void > > +update_tcp_header(char *base, uint16_t offset, uint32_t sent_seq, > > + uint8_t non_tail) > > +{ > > + struct tcp_hdr *tcp_hdr = (struct tcp_hdr *)(base + offset); > > + > > + tcp_hdr->sent_seq = rte_cpu_to_be_32(sent_seq); > > + /* clean FIN and PSH for non-tail segments */ > > + if (non_tail) > > + tcp_hdr->tcp_flags &= (~(TCP_HDR_PSH_MASK | TCP_HDR_FIN_MASK)); > > +} > > + > > +void > > +gso_update_pkt_headers(struct rte_mbuf *pkt, uint16_t nb_segments, > > + struct rte_mbuf **out_segments) > > +{ > > + struct ipv4_hdr *ipv4_hdr; > > + struct tcp_hdr *tcp_hdr; > > + struct rte_mbuf *seg; > > + uint32_t sent_seq; > > + uint16_t offset, i; > > + uint16_t tail_seg_idx = nb_segments - 1, id; > > + > > + switch (pkt->packet_type) { > > + case ETHER_VLAN_IPv4_TCP_PKT: > > + case ETHER_IPv4_TCP_PKT: > > Might be worth to put code below in a separate function: > update_inner_tcp_hdr(..) or so. > Then you can reuse it for tunneled cases too. Yes, I will modify it in the next patch. Thanks. > > > + ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) > > + pkt->l2_len); > > + tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len); > > + id = rte_be_to_cpu_16(ipv4_hdr->packet_id); > > + sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq); > > + > > + for (i = 0; i < nb_segments; i++) { > > + seg = out_segments[i]; > > + > > + offset = seg->l2_len; > > + update_ipv4_header(rte_pktmbuf_mtod(seg, char *), > > + offset, seg->pkt_len, id); > > + id++; > > Who would be responsible to make sure that we wouldn't have consecutive > packets with the IPV4 id? > Would be the upper layer that forms the packet or gso library or ...? Oh yes. I ingore this important issue. I don't think applications can guarantee it. I will check the design of linux and try to figure out a way. Thanks for reminder. > > > + > > + offset += seg->l3_len; > > + update_tcp_header(rte_pktmbuf_mtod(seg, char *), > > + offset, sent_seq, i < tail_seg_idx); > > + sent_seq += seg->next->data_len; > > + } > > + break; > > + } > > +} > > diff --git a/lib/librte_gso/gso_common.h b/lib/librte_gso/gso_common.h > > new file mode 100644 > > index 0000000..d750041 > > --- /dev/null > > +++ b/lib/librte_gso/gso_common.h > > @@ -0,0 +1,120 @@ > > +/*- > > + * BSD LICENSE > > + * > > + * Copyright(c) 2017 Intel Corporation. All rights reserved. > > + * All rights reserved. > > + * > > + * Redistribution and use in source and binary forms, with or without > > + * modification, are permitted provided that the following conditions > > + * are met: > > + * > > + * * Redistributions of source code must retain the above copyright > > + * notice, this list of conditions and the following disclaimer. > > + * * Redistributions in binary form must reproduce the above copyright > > + * notice, this list of conditions and the following disclaimer in > > + * the documentation and/or other materials provided with the > > + * distribution. > > + * * Neither the name of Intel Corporation nor the names of its > > + * contributors may be used to endorse or promote products derived > > + * from this software without specific prior written permission. > > + * > > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > > + */ > > + > > +#ifndef _GSO_COMMON_H_ > > +#define _GSO_COMMON_H_ > > + > > +#include <stdint.h> > > +#include <rte_mbuf.h> > > + > > +#define IPV4_HDR_DF_SHIFT 14 > > +#define IPV4_HDR_DF_MASK (1 << IPV4_HDR_DF_SHIFT) > > +#define IPv4_HDR_LEN(iph) ((iph->version_ihl & 0x0f) * 4) > > + > > +#define TCP_HDR_PSH_MASK ((uint8_t)0x08) > > +#define TCP_HDR_FIN_MASK ((uint8_t)0x01) > > +#define TCP_HDR_LEN(tcph) ((tcph->data_off & 0xf0) >> 2) > > + > > +#define ETHER_IPv4_PKT (RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4) > > +/* Supported packet types */ > > +/* TCP/IPv4 packet. */ > > +#define ETHER_IPv4_TCP_PKT (ETHER_IPv4_PKT | RTE_PTYPE_L4_TCP) > > + > > +/* TCP/IPv4 packet with VLAN tag. */ > > +#define ETHER_VLAN_IPv4_TCP_PKT (RTE_PTYPE_L2_ETHER_VLAN | \ > > + RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_TCP) > > + > > +#define IS_VLAN_PKT(pkt) ((pkt->packet_type & RTE_PTYPE_L2_ETHER_VLAN) == \ > > + RTE_PTYPE_L2_ETHER_VLAN) > > + > > +/** > > + * Internal function which parses a packet, setting outer_l2/l3_len and > > + * l2/l3/l4_len and packet_type. > > + * > > + * @param pkt > > + * Packet to parse. > > + */ > > +void gso_parse_packet(struct rte_mbuf *pkt); > > + > > +/** > > + * Internal function which updates relevant packet headers, following > > + * segmentation. This is required to update, for example, the IPv4 > > + * 'total_length' field, to reflect the reduced length of the now- > > + * segmented packet. > > + * > > + * @param pkt > > + * The original packet. > > + * @param nb_segments > > + * The number of GSO segments into which pkt was split. > > + * @param out_segements > > + * Pointer array used for storing mbuf addresses for GSO segments. > > + */ > > +void gso_update_pkt_headers(struct rte_mbuf *pkt, uint16_t nb_segments, > > + struct rte_mbuf **out_segments); > > + > > +/** > > + * Internal function which divides the input packet into small segments. > > + * Each of the newly-created segments is organized as a two-segment mbuf, > > + * where the first segment is a standard mbuf, which stores a copy of > > + * packet header, and the second is an indirect mbuf which points to a > > + * section of data in the input packet. > > + * > > + * @param pkt > > + * Packet to segment. > > + * @param pkt_hdr_offset > > + * Packet header offset, measured in byte. > > + * @param pyld_unit_size > > + * The max payload length of a GSO segment. > > + * @param direct_pool > > + * MBUF pool used for allocating direct buffers for output segments. > > + * @param indirect_pool > > + * MBUF pool used for allocating indirect buffers for output segments. > > + * @param pkts_out > > + * Pointer array used to keep the mbuf addresses of output segments. > > + * @param nb_pkts_out > > + * The max number of items that pkts_out can keep. > > + * > > + * @return > > + * - The number of segments created in the event of success. > > + * - If no GSO is performed, return 1. > > + * - If available memory in mempools is insufficient, return -ENOMEM. > > + * - -EINVAL for invalid parameters > > + */ > > +int gso_do_segment(struct rte_mbuf *pkt, > > + uint16_t pkt_hdr_offset, > > + uint16_t pyld_unit_size, > > + struct rte_mempool *direct_pool, > > + struct rte_mempool *indirect_pool, > > + struct rte_mbuf **pkts_out, > > + uint16_t nb_pkts_out); > > +#endif > > diff --git a/lib/librte_gso/gso_tcp.c b/lib/librte_gso/gso_tcp.c > > new file mode 100644 > > index 0000000..9d5fc30 > > --- /dev/null > > +++ b/lib/librte_gso/gso_tcp.c > > @@ -0,0 +1,82 @@ > > +/*- > > + * BSD LICENSE > > + * > > + * Copyright(c) 2017 Intel Corporation. All rights reserved. > > + * All rights reserved. > > + * > > + * Redistribution and use in source and binary forms, with or without > > + * modification, are permitted provided that the following conditions > > + * are met: > > + * > > + * * Redistributions of source code must retain the above copyright > > + * notice, this list of conditions and the following disclaimer. > > + * * Redistributions in binary form must reproduce the above copyright > > + * notice, this list of conditions and the following disclaimer in > > + * the documentation and/or other materials provided with the > > + * distribution. > > + * * Neither the name of Intel Corporation nor the names of its > > + * contributors may be used to endorse or promote products derived > > + * from this software without specific prior written permission. > > + * > > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > > + */ > > + > > + > > +#include <rte_ether.h> > > +#include <rte_ip.h> > > +#include <rte_tcp.h> > > + > > +#include "gso_common.h" > > +#include "gso_tcp.h" > > + > > +int > > +gso_tcp4_segment(struct rte_mbuf *pkt, > > + uint16_t gso_size, > > + struct rte_mempool *direct_pool, > > + struct rte_mempool *indirect_pool, > > + struct rte_mbuf **pkts_out, > > + uint16_t nb_pkts_out) > > +{ > > + struct ether_hdr *eth_hdr; > > + struct ipv4_hdr *ipv4_hdr; > > + uint16_t tcp_dl; > > + uint16_t pyld_unit_size; > > + uint16_t hdr_offset; > > + int ret = 1; > > + > > + eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *); > > + ipv4_hdr = (struct ipv4_hdr *)((char *)eth_hdr + pkt->l2_len); > > + > > + /* don't process fragmented packet */ > > + if ((ipv4_hdr->fragment_offset & > > + rte_cpu_to_be_16(IPV4_HDR_DF_MASK)) == 0) > > + return ret; > > + > > + tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - > > + pkt->l3_len - pkt->l4_len; > > + /* don't process packet without data */ > > + if (tcp_dl == 0) > > + return ret; > > + > > + hdr_offset = pkt->l2_len + pkt->l3_len + pkt->l4_len; > > + pyld_unit_size = gso_size - hdr_offset - ETHER_CRC_LEN; > > + > > + /* segment the payload */ > > + ret = gso_do_segment(pkt, hdr_offset, pyld_unit_size, direct_pool, > > + indirect_pool, pkts_out, nb_pkts_out); > > + > > + if (ret > 1) > > + gso_update_pkt_headers(pkt, ret, pkts_out); > > + > > + return ret; > > +} > > diff --git a/lib/librte_gso/gso_tcp.h b/lib/librte_gso/gso_tcp.h > > new file mode 100644 > > index 0000000..f291ccb > > --- /dev/null > > +++ b/lib/librte_gso/gso_tcp.h > > @@ -0,0 +1,73 @@ > > +/*- > > + * BSD LICENSE > > + * > > + * Copyright(c) 2017 Intel Corporation. All rights reserved. > > + * All rights reserved. > > + * > > + * Redistribution and use in source and binary forms, with or without > > + * modification, are permitted provided that the following conditions > > + * are met: > > + * > > + * * Redistributions of source code must retain the above copyright > > + * notice, this list of conditions and the following disclaimer. > > + * * Redistributions in binary form must reproduce the above copyright > > + * notice, this list of conditions and the following disclaimer in > > + * the documentation and/or other materials provided with the > > + * distribution. > > + * * Neither the name of Intel Corporation nor the names of its > > + * contributors may be used to endorse or promote products derived > > + * from this software without specific prior written permission. > > + * > > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > > + */ > > + > > +#ifndef _GSO_TCP_H_ > > +#define _GSO_TCP_H_ > > + > > +#include <stdint.h> > > +#include <rte_mbuf.h> > > + > > +/** > > + * Segment an IPv4/TCP packet. This function assumes the input packet has > > + * correct checksums and doesn't update checksums for GSO segment. > > + * Furthermore, it doesn't process IP fragment packets. > > + * > > + * @param pkt > > + * The packet mbuf to segment. > > + * @param gso_size > > + * The max length of a GSO segment, measured in bytes. > > + * @param direct_pool > > + * MBUF pool used for allocating direct buffers for output segments. > > + * @param indirect_pool > > + * MBUF pool used for allocating indirect buffers for output segments. > > + * @param pkts_out > > + * Pointer array, which is used to store mbuf addresses of GSO segments. > > + * Caller should guarantee that 'pkts_out' is sufficiently large to store > > + * all GSO segments. > > + * @param nb_pkts_out > > + * The max number of items that 'pkts_out' can keep. > > + * > > + * @return > > + * - The number of GSO segments on success. > > + * - Return 1 if no GSO is performed. > > + * - Return -ENOMEM if available memory in mempools is insufficient. > > + * - Return -EINVAL for invalid parameters. > > + */ > > +int gso_tcp4_segment(struct rte_mbuf *pkt, > > + uint16_t gso_size, > > + struct rte_mempool *direct_pool, > > + struct rte_mempool *indirect_pool, > > + struct rte_mbuf **pkts_out, > > + uint16_t nb_pkts_out); > > + > > +#endif > > diff --git a/lib/librte_gso/rte_gso.c b/lib/librte_gso/rte_gso.c > > index b81afce..fac95f2 100644 > > --- a/lib/librte_gso/rte_gso.c > > +++ b/lib/librte_gso/rte_gso.c > > @@ -31,17 +31,57 @@ > > * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > > */ > > > > +#include <rte_log.h> > > + > > #include "rte_gso.h" > > +#include "gso_common.h" > > +#include "gso_tcp.h" > > > > int > > rte_gso_segment(struct rte_mbuf *pkt, > > struct rte_gso_ctx gso_ctx, > > struct rte_mbuf **pkts_out, > > - uint16_t nb_pkts_out __rte_unused) > > + uint16_t nb_pkts_out) > > { > > + struct rte_mempool *direct_pool, *indirect_pool; > > + struct rte_mbuf *pkt_seg; > > + uint16_t nb_segments, gso_size; > > + > > if (pkt == NULL || pkts_out == NULL || gso_ctx.direct_pool == > > NULL || gso_ctx.indirect_pool == NULL) > > return -EINVAL; > > Probably we don't need to check gso_ctx values for each incoming packet. > If you feel it is necessary - create new function rte_gso_ctx_check() that > could be performed just once per ctx. Agree. I will change it. Thanks. > > > > > - return 1; > > + if ((gso_ctx.gso_types & RTE_GSO_TCP_IPV4) == 0 || > > + gso_ctx.gso_size >= pkt->pkt_len || > > + gso_ctx.gso_size == 0) > > > First and third condition seems redundant. The reason to check gso_ctx.gso_types here is that we don't perform GSO if applications don't set RTE_GSO_TCP_IPV4 to gso_ctx.gso_types, even the input packet is TCP/IPv4. And if gso_ctx.gso_size is 0, we don't need to execute the following codes. So we still need to remove these two conditions? > > > + return 1; > > > I think you forgot here: > pkts_out[0] = pkt; But why should we keep the input packet in the output array? Currently, if GSO is not performed, no packets will be kept in pkts_out[]. Applications can know it by the return value 1 of rte_gso_segment(). > > > > + > > + pkt_seg = pkt; > > + gso_size = gso_ctx.gso_size; > > + direct_pool = gso_ctx.direct_pool; > > + indirect_pool = gso_ctx.indirect_pool; > > + > > + /* Parse packet headers to determine how to segment 'pkt' */ > > + gso_parse_packet(pkt); > > > I don't think we need to parse packet here. > Instead assume that user already filled packet_type and l2/l3/..._len fields > correctly. Hmm, I see it. Thanks. Thanks, Jiayu