Hi all,

> -----Original Message-----
> From: Dariusz Sosnowski <dsosnow...@nvidia.com>
> Sent: Tuesday, January 23, 2024 12:37
> To: Stephen Hemminger <step...@networkplumber.org>
> Cc: NBU-Contact-Thomas Monjalon (EXTERNAL) <tho...@monjalon.net>;
> Ferruh Yigit <ferruh.yi...@amd.com>; Andrew Rybchenko
> <andrew.rybche...@oktetlabs.ru>; Ori Kam <or...@nvidia.com>;
> dev@dpdk.org
> Subject: RE: [RFC] ethdev: fast path async flow API
> 
> Hi Stephen,
> 
> Sorry for such a late response.
> 
> From: Stephen Hemminger <step...@networkplumber.org>
> Sent: Thursday, January 4, 2024 02:08
> > On Wed, 3 Jan 2024 19:14:49 +0000
> > Dariusz Sosnowski <dsosnow...@nvidia.com> wrote:
> > > In summary, in my opinion extending the async flow API with bulking
> > capabilities or exposing the queue directly to the application is not 
> > desirable.
> > > This proposal aims to reduce the I-cache overhead in async flow API
> > > by
> > reusing the existing design pattern in DPDK - fast path functions are
> > inlined to the application code and they call cached PMD callbacks.
> >
> > Inline needs to more discouraged in DPDK, because it only works if
> > application ends up building with DPDK from source.
> > It doesn't work for the Linux distro packaging model and symbol
> > versioning, etc.
> 
> I understand the problem. In that case, I have a proposal.
> I had a chat with Thomas regarding this RFC, and he noticed that there are 2
> separate changes proposed here:
> 
> 1. Per-port callbacks for async flow API.
>     - Moves specified callbacks to rte_flow_fp_ops struct and allow PMDs to
> register them dynamically.
>     - Removes indirection at API level - no need to call rte_flow_ops_get().
>     - Removes checking if callbacks are defined - either the default DPDK 
> callback
> is used or the one provided by PMD.
> 2. Make async flow API functions inlineable.
> 
> Change (1) won't break ABI (existing callbacks in rte_flow_ops can be marked
> as deprecated for now and phased out later) and in my opinion is less
> controversial compared to change (2).
> 
> I'll rerun the benchmarks for both changes separately to compare their
> benefits in isolation.
> This would allow us to decide if change (2) is worth the drawbacks it
> introduces.
> 
> What do you think?

I split the RFC into 2 parts:

1. Introduce per-port callbacks:
    - Introduce rte_flow_fp_ops struct - holds callbacks for fast path calls, 
for each port. PMD registers callbacks through rte_flow_fp_ops_register().
    - Relevant rte_flow_async_* functions just pass arguments to fast path 
callbacks. Validation checks are done only if RTE_FLOW_DEBUG macro is defined.
    - Biggest difference is the removal of callback access through 
rte_flow_get_ops().
2. Inline async flow API functions.
    - Relevant rte_flow_async_* functions definitions are moved to rte_flow.h 
and made inlineable.

Here are the results of the benchmark:

|                       | Avg Insertion | Diff over baseline | Avg Deletion | 
Diff over baseline |
|-----------------------|---------------|--------------------|--------------|--------------------|
| baseline (v24.03-rc0) |     5855.4    |                    |    9026.8    |   
                 |
| applied (1)           |     6384.2    |    +528.8 (+9%)    |    10054.2   |  
+1027.4 (+11.4%)  |
| applied (2)           |     6434.6    |   +579.2 (+9.9%)   |    10011.4   |   
+984.6 (+10.9%)  |

Results are in KFlows/sec.
The benchmark is continuously inserting and deleting 2M flow rules.
These rules match on IPv4 destination address and with a single action DROP.
Flow rules are inserted and deleted using a single flow queue.

Change (2) improves insertion rate performance by ~1% compared to (1), but 
decreases deletion rate by ~0.5%.
Based on these results, I think we can say that making rte_flow_async_*() calls 
inlineable may not be worth it compared to the issues it causes.

What do you all think about the results?

Best regards,
Dariusz Sosnowski
 

Reply via email to