Hi Ferruh,

> -----Original Message-----
> From: Ferruh Yigit <ferruh.yi...@amd.com>
> Sent: Monday, January 29, 2024 18:36
> To: Dariusz Sosnowski <dsosnow...@nvidia.com>; Stephen Hemminger
> <step...@networkplumber.org>
> Cc: NBU-Contact-Thomas Monjalon (EXTERNAL) <tho...@monjalon.net>;
> Andrew Rybchenko <andrew.rybche...@oktetlabs.ru>; Ori Kam
> <or...@nvidia.com>; dev@dpdk.org
> Subject: Re: [RFC] ethdev: fast path async flow API
> 
> On 1/29/2024 1:38 PM, Dariusz Sosnowski wrote:
> > Hi all,
> >
> >> -----Original Message-----
> >> From: Dariusz Sosnowski <dsosnow...@nvidia.com>
> >> Sent: Tuesday, January 23, 2024 12:37
> >> To: Stephen Hemminger <step...@networkplumber.org>
> >> Cc: NBU-Contact-Thomas Monjalon (EXTERNAL) <tho...@monjalon.net>;
> >> Ferruh Yigit <ferruh.yi...@amd.com>; Andrew Rybchenko
> >> <andrew.rybche...@oktetlabs.ru>; Ori Kam <or...@nvidia.com>;
> >> dev@dpdk.org
> >> Subject: RE: [RFC] ethdev: fast path async flow API
> >>
> >> Hi Stephen,
> >>
> >> Sorry for such a late response.
> >>
> >> From: Stephen Hemminger <step...@networkplumber.org>
> >> Sent: Thursday, January 4, 2024 02:08
> >>> On Wed, 3 Jan 2024 19:14:49 +0000
> >>> Dariusz Sosnowski <dsosnow...@nvidia.com> wrote:
> >>>> In summary, in my opinion extending the async flow API with bulking
> >>> capabilities or exposing the queue directly to the application is not
> desirable.
> >>>> This proposal aims to reduce the I-cache overhead in async flow API
> >>>> by
> >>> reusing the existing design pattern in DPDK - fast path functions
> >>> are inlined to the application code and they call cached PMD callbacks.
> >>>
> >>> Inline needs to more discouraged in DPDK, because it only works if
> >>> application ends up building with DPDK from source.
> >>> It doesn't work for the Linux distro packaging model and symbol
> >>> versioning, etc.
> >>
> >> I understand the problem. In that case, I have a proposal.
> >> I had a chat with Thomas regarding this RFC, and he noticed that
> >> there are 2 separate changes proposed here:
> >>
> >> 1. Per-port callbacks for async flow API.
> >>     - Moves specified callbacks to rte_flow_fp_ops struct and allow
> >> PMDs to register them dynamically.
> >>     - Removes indirection at API level - no need to call 
> >> rte_flow_ops_get().
> >>     - Removes checking if callbacks are defined - either the default
> >> DPDK callback is used or the one provided by PMD.
> >> 2. Make async flow API functions inlineable.
> >>
> >> Change (1) won't break ABI (existing callbacks in rte_flow_ops can be
> >> marked as deprecated for now and phased out later) and in my opinion
> >> is less controversial compared to change (2).
> >>
> >> I'll rerun the benchmarks for both changes separately to compare
> >> their benefits in isolation.
> >> This would allow us to decide if change (2) is worth the drawbacks it
> >> introduces.
> >>
> >> What do you think?
> >
> > I split the RFC into 2 parts:
> >
> > 1. Introduce per-port callbacks:
> >     - Introduce rte_flow_fp_ops struct - holds callbacks for fast path 
> > calls, for
> each port. PMD registers callbacks through rte_flow_fp_ops_register().
> >     - Relevant rte_flow_async_* functions just pass arguments to fast path
> callbacks. Validation checks are done only if RTE_FLOW_DEBUG macro is
> defined.
> >     - Biggest difference is the removal of callback access through
> rte_flow_get_ops().
> > 2. Inline async flow API functions.
> >     - Relevant rte_flow_async_* functions definitions are moved to 
> > rte_flow.h
> and made inlineable.
> >
> > Here are the results of the benchmark:
> >
> > |                       | Avg Insertion | Diff over baseline | Avg
> > | Deletion | Diff over baseline |
> > |-----------------------|---------------|--------------------|--------------|-------------
> -------|
> > | baseline (v24.03-rc0) |     5855.4    |                    |    9026.8    
> > |                    |
> > | applied (1)           |     6384.2    |    +528.8 (+9%)    |    10054.2   
> > |  +1027.4
> (+11.4%)  |
> > | applied (2)           |     6434.6    |   +579.2 (+9.9%)   |    10011.4   
> > |   +984.6
> (+10.9%)  |
> >
> > Results are in KFlows/sec.
> > The benchmark is continuously inserting and deleting 2M flow rules.
> > These rules match on IPv4 destination address and with a single action
> DROP.
> > Flow rules are inserted and deleted using a single flow queue.
> >
> > Change (2) improves insertion rate performance by ~1% compared to (1),
> but decreases deletion rate by ~0.5%.
> > Based on these results, I think we can say that making rte_flow_async_*()
> calls inlineable may not be worth it compared to the issues it causes.
> >
> > What do you all think about the results?
> >
> 
> Hi Dariusz,
> 
> As discussed before, converting APIs to static inline functions or exposing
> 'rte_eth_dev' has cons but with only applying first part above seems get rid 
> of
> them with reasonable performance gain, so I think we can continue with this
> approach and continue to reviews.
> 
> Most of the 'rte_flow_async_*' are already missing the function validity 
> check,
> so having a default (dummy?) rte_flow_fp_ops for them seems even an
> improvement there.
> 
> 
> Only previously 'struct rte_flow_ops' was coming from drivers, ethdev layer
> doesn't need to maintain anything.
> But with 'rte_flow_fp_ops' struct, ethdev needs to store another array with
> fixed ('RTE_MAX_ETHPORTS') size which will be another blocker in the future
> to convert this fixed arrays to dynamically allocated arrays.
> 
> For this, does it still help we add an a new field to "struct rte_eth_dev", 
> like
> "struct rte_flow_ops *flow_ops", similar to 'dev_ops'?
> The indirection still will be there, but eliminate 'rte_flow_get_ops()'
> call and checks comes with it.
> If makes sense, is there a chance to experiment this and get some performance
> numbers with it?
Which option do you have in mind specifically?

1. Keeping only fast path callbacks in "dev->flow_ops", so "struct rte_eth_dev" 
will hold only a pointer to "struct rte_flow_fp_ops" as defined in RFC.
    - Only async flow API will use "dev->flow_ops->callback".
    - Other APIs will go through "rte_flow_ops_get()"
2. Keeping all flow callbacks in "dev->flow_ops", so "struct rte_flow_ops".
    - As a result, I think that "rte_flow_get_ops()" could be removed 
altogether assuming that all functions have default implementation,
      checking for port validity (ENODEV if port not valid) and returning 
ENOSYS.

Regardless of the version, I can experiment with additional indirection.

Best regards,
Dariusz Sosnowski

Reply via email to