Hi all, > -----Original Message----- > From: Dariusz Sosnowski <dsosnow...@nvidia.com> > Sent: Tuesday, January 23, 2024 12:37 > To: Stephen Hemminger <step...@networkplumber.org> > Cc: NBU-Contact-Thomas Monjalon (EXTERNAL) <tho...@monjalon.net>; > Ferruh Yigit <ferruh.yi...@amd.com>; Andrew Rybchenko > <andrew.rybche...@oktetlabs.ru>; Ori Kam <or...@nvidia.com>; > dev@dpdk.org > Subject: RE: [RFC] ethdev: fast path async flow API > > Hi Stephen, > > Sorry for such a late response. > > From: Stephen Hemminger <step...@networkplumber.org> > Sent: Thursday, January 4, 2024 02:08 > > On Wed, 3 Jan 2024 19:14:49 +0000 > > Dariusz Sosnowski <dsosnow...@nvidia.com> wrote: > > > In summary, in my opinion extending the async flow API with bulking > > capabilities or exposing the queue directly to the application is not > > desirable. > > > This proposal aims to reduce the I-cache overhead in async flow API > > > by > > reusing the existing design pattern in DPDK - fast path functions are > > inlined to the application code and they call cached PMD callbacks. > > > > Inline needs to more discouraged in DPDK, because it only works if > > application ends up building with DPDK from source. > > It doesn't work for the Linux distro packaging model and symbol > > versioning, etc. > > I understand the problem. In that case, I have a proposal. > I had a chat with Thomas regarding this RFC, and he noticed that there are 2 > separate changes proposed here: > > 1. Per-port callbacks for async flow API. > - Moves specified callbacks to rte_flow_fp_ops struct and allow PMDs to > register them dynamically. > - Removes indirection at API level - no need to call rte_flow_ops_get(). > - Removes checking if callbacks are defined - either the default DPDK > callback > is used or the one provided by PMD. > 2. Make async flow API functions inlineable. > > Change (1) won't break ABI (existing callbacks in rte_flow_ops can be marked > as deprecated for now and phased out later) and in my opinion is less > controversial compared to change (2). > > I'll rerun the benchmarks for both changes separately to compare their > benefits in isolation. > This would allow us to decide if change (2) is worth the drawbacks it > introduces. > > What do you think?
I split the RFC into 2 parts: 1. Introduce per-port callbacks: - Introduce rte_flow_fp_ops struct - holds callbacks for fast path calls, for each port. PMD registers callbacks through rte_flow_fp_ops_register(). - Relevant rte_flow_async_* functions just pass arguments to fast path callbacks. Validation checks are done only if RTE_FLOW_DEBUG macro is defined. - Biggest difference is the removal of callback access through rte_flow_get_ops(). 2. Inline async flow API functions. - Relevant rte_flow_async_* functions definitions are moved to rte_flow.h and made inlineable. Here are the results of the benchmark: | | Avg Insertion | Diff over baseline | Avg Deletion | Diff over baseline | |-----------------------|---------------|--------------------|--------------|--------------------| | baseline (v24.03-rc0) | 5855.4 | | 9026.8 | | | applied (1) | 6384.2 | +528.8 (+9%) | 10054.2 | +1027.4 (+11.4%) | | applied (2) | 6434.6 | +579.2 (+9.9%) | 10011.4 | +984.6 (+10.9%) | Results are in KFlows/sec. The benchmark is continuously inserting and deleting 2M flow rules. These rules match on IPv4 destination address and with a single action DROP. Flow rules are inserted and deleted using a single flow queue. Change (2) improves insertion rate performance by ~1% compared to (1), but decreases deletion rate by ~0.5%. Based on these results, I think we can say that making rte_flow_async_*() calls inlineable may not be worth it compared to the issues it causes. What do you all think about the results? Best regards, Dariusz Sosnowski