Hi Wang,
1. It is better to explicitly use __rte_noinline with this function
because my gcc still inlines it even without the inline qualifier.
2. The same should be applied to _v20 functions.
3. Please try running the tests again and show the results.
4. Make this patch the first in a series.
On 19/06/2019 06:36, Ruifeng Wang wrote:
Tests showed that the 'inline' keyword caused performance drop
on some x86 platforms after the memory ordering patches applied.
By removing the 'inline' keyword, the performance was recovered
as before on x86 and no impact to arm64 platforms.
Suggested-by: Medvedkin Vladimir <vladimir.medved...@intel.com>
Signed-off-by: Ruifeng Wang <ruifeng.w...@arm.com>
Reviewed-by: Gavin Hu <gavin...@arm.com>
---
v2: initail version to recover rte_lpm_add() performance
lib/librte_lpm/rte_lpm.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index 0addff5d4..c97b602e6 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -778,7 +778,7 @@ add_depth_small_v20(struct rte_lpm_v20 *lpm, uint32_t ip,
uint8_t depth,
return 0;
}
-static inline int32_t
+static int32_t
add_depth_small_v1604(struct rte_lpm *lpm, uint32_t ip, uint8_t depth,
uint32_t next_hop)
{
@@ -975,7 +975,7 @@ add_depth_big_v20(struct rte_lpm_v20 *lpm, uint32_t
ip_masked, uint8_t depth,
return 0;
}
-static inline int32_t
+static int32_t
add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
uint32_t next_hop)
{
--
Regards,
Vladimir