Hi Thomas, Thanks for your comments. Almost all the comments are addressed. PSB.
> -----Original Message----- > From: Thomas Monjalon <tho...@monjalon.net> > Sent: Friday, April 16, 2021 6:50 PM > To: Bing Zhao <bi...@nvidia.com> > Cc: Ori Kam <or...@nvidia.com>; ferruh.yi...@intel.com; > andrew.rybche...@oktetlabs.ru; dev@dpdk.org; > ajit.khapa...@broadcom.com; jer...@marvell.com; humi...@huawei.com; > rosen...@intel.com; hemant.agra...@nxp.com > Subject: Re: [dpdk-dev] [PATCH v2 1/2] ethdev: introduce conntrack > flow action and item > > External email: Use caution opening links or attachments > > > 15/04/2021 18:41, Bing Zhao: > > This commit introduced the conntrack action and item. > > > > Usually the HW offloading is stateless. For some stateful > offloading > > like a TCP connection, HW module will help provide the ability of > a > > full offloading w/o SW participation after the connection was > > established. > > > > The basic usage is that in the first flow the application should > add > > the conntrack action and in the following flow(s) the application > > should use the conntrack item to match on the result. > > You probably mean "flow rule", not "traffic flow". > Please make it clear to avoid confusion. Done > > > A TCP connection has two directions traffic. To set a conntrack > action > > context correctly, information from packets of both directions are > > required. > > > > The conntrack action should be created on one port and supply the > peer > > port as a parameter to the action. After context creating, it > could > > only be used between the ports (dual-port mode) or a single port. > The > > application should modify the action via the API > > "action_handle_update" only when before using it to create a flow > with > > opposite direction. This will help the driver to recognize the > > direction of the flow to be created, especially in single port > mode. > > The traffic from both directions will go through the same port if > the > > application works as an "forwarding engine" but not a end point. > > There is no need to call the update interface if the subsequent > flows > > have nothing to be changed. > > I am not sure this is a feature description for the commit log or an > usage explanation for the doc. > In any case, please distinguish "ethdev port" and "TCP port" > to avoid confusion. Changed, thanks. > > > Query will be supported via action_ctx_query interface, about the > > current packets information and connection status. Tha fields > query > > capabilities depends on the HW. > > > > For the packets received during the conntrack setup, it is > suggested > > to re-inject the packets in order to take full advantage of the > > What do you mean by "full advantage"? > It is counter-intuitive to re-inject for offloading. > Does it improve the performance? No, it is not for the performance but for the functionality correctness. Before the CT established, some data+ack packets may already be received by the SW, and the application will use the initial information to setup a conntrack. It may result into some error checking for the following packets. By re-injecting the packets already received by SW before the established CT, it will make the HW have all the packets information and check the following packets correctly. > > > conntrack. Only the valid packets should pass the conntrack, > packets > > with invalid TCP information, like out of window, or with invalid > > header, like malformed, should not pass. > > > > Naming and definition: > > You mean naming is inspired from Linux? The naming and critical fields definition. The original idea are from the paper listed below (correct me if I was wrong), and there is some well-known definition in this area, to my understanding, it would be better to follow it. > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fel > ix > > > ir.bootlin.com%2Flinux%2Flatest%2Fsource%2Finclude%2Fuapi%2Flinux%2F > ne > > > tfilter%2Fnf_conntrack_tcp.h&data=04%7C01%7Cbingz%40nvidia.com%7 > Ca > > > d68b128653e4428da6e08d900c56373%7C43083d15727340c1b7db39efd9ccc17a%7 > C0 > > %7C1%7C637541670056627642%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw > MDAi > > > LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=kCn9 > kF > > bi7yWrd7A94zFibvQEB97phXXUudSA%2BhAueTU%3D&reserved=0 > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fel > ix > > > ir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fnet%2Fnetfilter%2Fnf_conn > tr > > > ack_proto_tcp.c&data=04%7C01%7Cbingz%40nvidia.com%7Cad68b128653e > 44 > > > 28da6e08d900c56373%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C1%7C6375 > 41 > > > 670056627642%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2l > uM > > > zIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Ajs9NtCaEpG2Kfnjy > t5 > > X8uwOFo2HyfMdWZbx%2BHkbvX8%3D&reserved=0 > > > > Other reference: > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fww > w. > > > usenix.org%2Flegacy%2Fevents%2Fsec01%2Finvitedtalks%2Frooij.pdf& > da > > > ta=04%7C01%7Cbingz%40nvidia.com%7Cad68b128653e4428da6e08d900c56373%7 > C4 > > > 3083d15727340c1b7db39efd9ccc17a%7C0%7C1%7C637541670056627642%7CUnkno > wn > > %7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWw > iLCJ > > > XVCI6Mn0%3D%7C1000&sdata=987HVU%2FefoJ40%2B6aM0Q1RVxcGH5nVJS4bzy > 4A > > 4ZoYSE%3D&reserved=0 > > > > Signed-off-by: Bing Zhao <bi...@nvidia.com> > [...] > > + /** > > + * [META] > > + * > > + * Matches conntrack state. > > + * > > + * See struct rte_flow_item_conntrack. > > Please use @see for hyperlink in doxygen. > > > + */ > > + RTE_FLOW_ITEM_TYPE_CONNTRACK, > > }; > [...] > > +/** > > + * The packet is with valid state after conntrack checking. > > "is with valid state" looks strange. > I propose "The packet is valid after conntrack checking." Done > > > + */ > > +#define RTE_FLOW_CONNTRACK_FLAG_PKT_STATE_VALID (1 << 0) > > Please use RTE_BIT32(). Done > > > +/** > > + * The state of the connection was changed. > > + */ > > +#define RTE_FLOW_CONNTRACK_FLAG_PKT_STATE_CHANGED (1 << 1) > > +/** > > + * Error is detected on this packet for this connection and > > + * an invalid state is set. > > + */ > > +#define RTE_FLOW_CONNTRACK_FLAG_PKT_STATE_INVAL (1 << 2) > > "INVAL" is strange. Can we add the missing 2 characters? > RTE_FLOW_CONNTRACK_FLAG_PKT_STATE_INVALID > > On a related note, do we really need the word FLAG? > And it is conflicting with the prefix in enum > rte_flow_conntrack_tcp_last_index I think > RTE_FLOW_CONNTRACK_PKT_STATE_ is a good prefix, long enough. > Done > > +/** > > + * The HW connection tracking module is disabled. > > + * It can be due to application command or an invalid state. > > + */ > > +#define RTE_FLOW_CONNTRACK_FLAG_HW_DISABLED (1 << 3) > > This one does not have PKT in its name. > And it is limiting to HW, while the driver could implement conntrack > in SW. > I propose RTE_FLOW_CONNTRACK_PKT_DISABLED > Done > > +/** > > + * The packet contains some bad field(s) and cannot continue > > + * with the conntrack module checking. > > + */ > > +#define RTE_FLOW_CONNTRACK_FLAG_PKT_BAD (1 << 4) > > + > > +/** > > + * @warning > > + * @b EXPERIMENTAL: this structure may change without prior > notice > > + * > > + * RTE_FLOW_ITEM_TYPE_CONNTRACK > > + * > > + * Matches the state of a packet after it passed the connection > > +tracking > > + * examination. The state is a bit mask of one > > +RTE_FLOW_CONNTRACK_FLAG* > > s/bit mask/bitmap/ ? Done > > RTE_FLOW_CONNTRACK_PKT_STATE_* > otherwise it is messed with rte_flow_conntrack_tcp_last_index > > > + * or a reasonable combination of these bits. > > + */ > > +struct rte_flow_item_conntrack { > > + uint32_t flags; > > +}; > [...] > > + > > + /** > > + * [META] > > + * > > + * Enable tracking a TCP connection state. > > + * > > + * Send packet to HW connection tracking module for > examination. > > Not necessarily HW. > No packet is sent. > I think you can remove this sentence completely. > Done > > + * > > + * See struct rte_flow_action_conntrack. > > @see > > > + */ > > + RTE_FLOW_ACTION_TYPE_CONNTRACK, > > }; > > > > /** > > @@ -2875,6 +2940,136 @@ struct rte_flow_action_set_dscp { > > */ > > struct rte_flow_action_handle; > > > > +/** > > + * The state of a TCP connection. > > + */ > > +enum rte_flow_conntrack_state { > > + /**< SYN-ACK packet was seen. */ > > + RTE_FLOW_CONNTRACK_STATE_SYN_RECV, > > + /**< 3-way handshark was done. */ > > s/handshark/handshake/ > Done > > + RTE_FLOW_CONNTRACK_STATE_ESTABLISHED, > > + /**< First FIN packet was received to close the connection. > */ > > + RTE_FLOW_CONNTRACK_STATE_FIN_WAIT, > > + /**< First FIN was ACKed. */ > > + RTE_FLOW_CONNTRACK_STATE_CLOSE_WAIT, > > + /**< Second FIN was received, waiting for the last ACK. */ > > + RTE_FLOW_CONNTRACK_STATE_LAST_ACK, > > + /**< Second FIN was ACKed, connection was closed. */ > > + RTE_FLOW_CONNTRACK_STATE_TIME_WAIT, > > +}; > > + > > +/** > > + * The last passed TCP packet flags of a connection. > > + */ > > +enum rte_flow_conntrack_tcp_last_index { > > + RTE_FLOW_CONNTRACK_FLAG_NONE = 0, /**< No Flag. */ > > + RTE_FLOW_CONNTRACK_FLAG_SYN = (1 << 0), /**< With SYN flag. > */ > > + RTE_FLOW_CONNTRACK_FLAG_SYNACK = (1 << 1), /**< With SYN+ACK > flag. */ > > + RTE_FLOW_CONNTRACK_FLAG_FIN = (1 << 2), /**< With FIN flag. > */ > > + RTE_FLOW_CONNTRACK_FLAG_ACK = (1 << 3), /**< With ACK flag. > */ > > + RTE_FLOW_CONNTRACK_FLAG_RST = (1 << 4), /**< With RST flag. > */ > > +}; > > Please use RTE_BIT32(). > Done > > +/** > > + * @warning > > + * @b EXPERIMENTAL: this structure may change without prior > notice > > + * > > + * Configuration parameters for each direction of a TCP > connection. > > + */ > > +struct rte_flow_tcp_dir_param { > > + uint32_t scale:4; /**< TCP window scaling factor, 0xF to > disable. */ > > + uint32_t close_initiated:1; /**< The FIN was sent by this > direction. */ > > + /**< An ACK packet has been received by this side. */ > > Move all comments on their own line before the struct member. > Comment should then start with /** > All done, BTW, I see in the current code, the format "/**<" is used in a lot of parts. > > + uint32_t last_ack_seen:1; > > + /**< If set, indicates that there is unacked data of the > > + connection. */ > > not sure what means "unacked data of the connection" Updated the description, it means some packets were sent but not all of them are ACKed. > > > + uint32_t data_unacked:1; > > + /**< Maximal value of sequence + payload length over sent > > + * packets (next ACK from the opposite direction). > > + */ > > + uint32_t sent_end; > > + /**< Maximal value of (ACK + window size) over received > packet + length > > + * over sent packet (maximal sequence could be sent). > > + */ > > + uint32_t reply_end; > > + /**< Maximal value of actual window size over sent packets. > */ > > + uint32_t max_win; > > + /**< Maximal value of ACK over sent packets. */ > > + uint32_t max_ack; > > Not sure about the word "over" in above definitions. Changed to "in" > > > +}; > > + > > +/** > > + * @warning > > + * @b EXPERIMENTAL: this structure may change without prior > notice > > + * > > + * RTE_FLOW_ACTION_TYPE_CONNTRACK > > + * > > + * Configuration and initial state for the connection tracking > module. > > + * This structure could be used for both setting and query. > > + */ > > +struct rte_flow_action_conntrack { > > + uint16_t peer_port; /**< The peer port number, can be the > same port. */ > > + /**< Direction of this connection when creating a flow, the > value only > > + * affects the subsequent flows creation. > > + */ > > As for rte_flow_tcp_dir_param, better to move comments before, on > their own line. > > > + uint32_t is_original_dir:1; > > + /**< Enable / disable the conntrack HW module. When disabled, > the > > + * result will always be RTE_FLOW_CONNTRACK_FLAG_DISABLED. > > + * In this state the HW will act as passthrough. > > + * It only affects this conntrack object in the HW without > any effect > > + * to the other objects. > > + */ > > + uint32_t enable:1; > > + /**< At least one ack was seen, after the connection was > established. */ > > + uint32_t live_connection:1; > > + /**< Enable selective ACK on this connection. */ > > + uint32_t selective_ack:1; > > + /**< A challenge ack has passed. */ > > + uint32_t challenge_ack_passed:1; > > + /**< 1: The last packet is seen that comes from the original > direction. > > + * 0: From the reply direction. > > + */ > > + uint32_t last_direction:1; > > + /**< No TCP check will be done except the state change. */ > > + uint32_t liberal_mode:1; > > + /**< The current state of the connection. */ > > + enum rte_flow_conntrack_state state; > > + /**< Scaling factor for maximal allowed ACK window. */ > > + uint8_t max_ack_window; > > + /**< Maximal allowed number of retransmission times. */ > > + uint8_t retransmission_limit; > > + /**< TCP parameters of the original direction. */ > > + struct rte_flow_tcp_dir_param original_dir; > > + /**< TCP parameters of the reply direction. */ > > + struct rte_flow_tcp_dir_param reply_dir; > > + /**< The window value of the last packet passed this > conntrack. */ > > + uint16_t last_window; > > + enum rte_flow_conntrack_tcp_last_index last_index; > > + /**< The sequence of the last packet passed this conntrack. > */ > > + uint32_t last_seq; > > + /**< The acknowledgement of the last packet passed this > conntrack. */ > > + uint32_t last_ack; > > + /**< The total value ACK + payload length of the last packet > passed > > + * this conntrack. > > + */ > > + uint32_t last_end; > > +}; > > + > > +/** > > + * RTE_FLOW_ACTION_TYPE_CONNTRACK > > + * > > + * Wrapper structure for the context update interface. > > + * Ports cannot support updating, and the only valid solution is > to > > + * destroy the old context and create a new one instead. > > + */ > > +struct rte_flow_modify_conntrack { > > + /**< New connection tracking parameters to be updated. */ > > + struct rte_flow_action_conntrack new_ct; > > + uint32_t direction:1; /**< The direction field will be > updated. */ > > + /**< All the other fields except direction will be updated. > */ > > + uint32_t state:1; > > + uint32_t reserved:30; /**< Reserved bits for the future > usage. > > +*/ }; > > Thanks