Hi Stephen,

On 28/06/2019 16:35, Stephen Hemminger wrote:
On Fri, 28 Jun 2019 15:16:30 +0100
"Medvedkin, Vladimir" <vladimir.medved...@intel.com> wrote:

Hi Honnappa,

On 28/06/2019 14:57, Honnappa Nagarahalli wrote:
Hi all,

On 28/06/2019 05:34, Stephen Hemminger wrote:
On Fri, 28 Jun 2019 02:44:54 +0000
"Ruifeng Wang (Arm Technology China)"<ruifeng.w...@arm.com>  wrote:
Tests showed that the function inlining caused performance drop on
some x86 platforms with the memory ordering patches applied.
By force no-inline functions, the performance was better than
before on x86 and no impact to arm64 platforms.

Suggested-by: Medvedkin Vladimir<vladimir.medved...@intel.com>
Signed-off-by: Ruifeng Wang<ruifeng.w...@arm.com>
Reviewed-by: Gavin Hu<gavin...@arm.com>
    {

Do you actually need to force noinline or is just taking of inline enough?
In general, letting compiler decide is often best practice.
The force noinline is an optimization for x86 platforms to keep
rte_lpm_add() API performance with memory ordering applied.
I don't think you answered my question. What does a recent version of
GCC do if you drop the inline.

Actually all the functions in rte_lpm should drop inline.
I'm agree with Stephen. If it is not a fastpath and size of function is not
minimal it is good to remove inline qualifier for other control plane functions
such as rule_add/delete/find/etc and let the compiler decide to inline it
(unless it affects performance).
IMO, the rule needs to be simple. If it is control plane function, we should 
leave it to the compiler to decide. I do not think we need to worry too much 
about performance for control plane functions.
Control plane is not as important as data plane speed but it is still
important. For lpm we are talking not about initialization, but runtime
routes add/del related functions. If it is very slow the library will be
totally unusable because after it receives a route update it will be
blocked for a long time and route update queue would overflow.
Control plane performance is more impacted by algorithmic choice.
The original LPM had terrible (n^2?) control path. Current code is better.
I had a patch using RB tree, but it was rejected because it used the
/usr/include/bsd/sys/tree.h which added a dependency.

You're absolutely right,  control plane performance is mostly depends on algorithm. Current LPM implementation has number of problems there. One problem is rules_tbl[] that is a flat array containing routes for control plane purposes. Replacing it with a rb-tree solves this problem, but there are another problems. For example, when you try to add a route 10.0.0.0/8 while a number of subroutes are exist in the table (for example 10.20.0.0/16), current implementation will load tbl_entry -> do some checks (depth, ext entry) -> conditionally store new entry. Under several circumstances it would take a lot time.  But in fact it needs to unconditionally rewrite only two ranges - from 10.0.0.0 to 10.19.255.255 and from 10.21.0.0 to 10.255.255.255. And control plane could help us to get this two ranges. The best struct to do so is lc-tree because it is relatively easy to traverse subtree (described by 10.0.0.0/8) and get subroutes. We are working on a new implementation, hopefully it will be ready soon.

--
Regards,
Vladimir

Reply via email to