On 2/9/2024 1:10 PM, Rahul Bhansali wrote: > > >> -----Original Message----- >> From: Ferruh Yigit <ferruh.yi...@amd.com> >> Sent: Wednesday, February 7, 2024 4:06 PM >> To: Rahul Bhansali <rbhans...@marvell.com>; dev@dpdk.org; Radu Nicolau >> <radu.nico...@intel.com>; Akhil Goyal <gak...@marvell.com>; Konstantin >> Ananyev <konstantin.anan...@huawei.com>; Anoob Joseph >> <ano...@marvell.com> >> Subject: Re: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec performance >> drop >> >> On 2/7/2024 6:46 AM, Rahul Bhansali wrote: >>> >>> >>>> -----Original Message----- >>>> From: Ferruh Yigit <ferruh.yi...@amd.com> >>>> Sent: Tuesday, February 6, 2024 11:55 PM >>>> To: Rahul Bhansali <rbhans...@marvell.com>; dev@dpdk.org; Radu >>>> Nicolau <radu.nico...@intel.com>; Akhil Goyal <gak...@marvell.com>; >>>> Konstantin Ananyev <konstantin.anan...@huawei.com>; Anoob Joseph >>>> <ano...@marvell.com> >>>> Subject: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec >>>> performance drop >>>> >>>> External Email >>>> >>>> --------------------------------------------------------------------- >>>> - On 2/6/2024 12:38 PM, Rahul Bhansali wrote: >>>>> Single packet free using rte_pktmbuf_free_bulk() is dropping the >>>>> performance. On cn10k, maximum of ~4% drop observed for IPsec event >>>>> mode single SA outbound case. >>>>> >>>>> To fix this issue, single packet free will use rte_pktmbuf_free API. >>>>> >>>>> Fixes: bd7c063561b3 ("examples/ipsec-secgw: use bulk free") >>>>> >>>>> Signed-off-by: Rahul Bhansali <rbhans...@marvell.com> >>>>> --- >>>>> examples/ipsec-secgw/ipsec-secgw.h | 7 +++---- >>>>> 1 file changed, 3 insertions(+), 4 deletions(-) >>>>> >>>>> diff --git a/examples/ipsec-secgw/ipsec-secgw.h >>>>> b/examples/ipsec-secgw/ipsec-secgw.h >>>>> index 8baab44ee7..ec33a982df 100644 >>>>> --- a/examples/ipsec-secgw/ipsec-secgw.h >>>>> +++ b/examples/ipsec-secgw/ipsec-secgw.h >>>>> @@ -229,11 +229,10 @@ free_reassembly_fail_pkt(struct rte_mbuf *mb) >>>>> } >>>>> >>>>> /* helper routine to free bulk of packets */ -static inline void >>>>> -free_pkts(struct rte_mbuf *mb[], uint32_t n) >>>>> +static __rte_always_inline void >>>>> +free_pkts(struct rte_mbuf *mb[], const uint32_t n) >>>>> { >>>>> - rte_pktmbuf_free_bulk(mb, n); >>>>> - >>>>> + n == 1 ? rte_pktmbuf_free(mb[0]) : rte_pktmbuf_free_bulk(mb, n); >>>>> core_stats_update_drop(n); >>>>> } >>>>> >>>> >>>> Hi Rahul, >>>> >>>> Do you think the 'rte_pktmbuf_free_bulk()' API performance can be >>>> improved by similar change? >>> >>> Hi Ferruh, >>> Currently 'rte_pktmbuf_free_bulk() is not inline. If we make that along with >> __rte_pktmbuf_free_seg_via_array() both inline then performance can be >> improved similar. >>> >> >> Ah, so performance improvement is coming from 'rte_pktmbuf_free()' being >> inline, OK. >> >> As you are doing performance testing in that area, can you please check if >> '__rte_pktmbuf_free_seg_via_array()' is inlined, as it is static function I >> expect it >> to be inlined. If not, can you please test with force inlining it >> (__rte_always_inline)? > It was not inline, did check with force inline also and no impact with this, > so I can make it force inline. >
If there is no performance improvement, I think no need to force inline '__rte_pktmbuf_free_seg_via_array()'. >> >> >> And I wonder if bulk() API may get single mbuf is a common theme, does it >> makes >> sense add a new inline wrapper to library to cover this case, if it is >> bringing ~4% >> improvement, like: >> ``` >> static inline void >> rte_pktmbuf_free_bulk_or_one(... **mb, unsigned int n) { >> if (n == 1) >> return rte_pktmbuf_free(mb[0]); >> return rte_pktmbuf_free_bulk(mb, n); >> } > Agree, can make this wrapper to cover a case where bulk free API is called > but might have single mbuf to get better perf. It can be further optimize " > if (n == 1)" with compile time constant check, > ``` > static inline void > rte_pktmbuf_free_bulk_or_one(struct rte_mbuf **mb, unsigned int n) > { > if (__builtin_constant_p(n) && (n == 1)) > rte_pktmbuf_free(mb[0]); > else > rte_pktmbuf_free_bulk(mb, n); > } > ``` > Let me know if it is fine. I'll send v2. And, this will be " > __rte_experimental" right ? > Compile time constant check can prevent penalty from additional check, which is good, and I can see this can work for the examples/ipsec-secgw usecase above, which has some hardcoded single mbuf free calls. But most of the other usecases I think 'n' won't be known in compile time, so API will be effectively same as free_bulk(). If you have it with runtime check, do you still observe any performance improvement? If not perhaps we can go only with example code update, without new API.