> > Subject: Re: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec performance > drop > > > > On 2/9/2024 1:10 PM, Rahul Bhansali wrote: > > > > > > > > >> -----Original Message----- > > >> From: Ferruh Yigit <ferruh.yi...@amd.com> > > >> Sent: Wednesday, February 7, 2024 4:06 PM > > >> To: Rahul Bhansali <rbhans...@marvell.com>; dev@dpdk.org; Radu > > >> Nicolau <radu.nico...@intel.com>; Akhil Goyal <gak...@marvell.com>; > > >> Konstantin Ananyev <konstantin.anan...@huawei.com>; Anoob Joseph > > >> <ano...@marvell.com> > > >> Subject: Re: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec > > >> performance drop > > >> > > >> On 2/7/2024 6:46 AM, Rahul Bhansali wrote: > > >>> > > >>> > > >>>> -----Original Message----- > > >>>> From: Ferruh Yigit <ferruh.yi...@amd.com> > > >>>> Sent: Tuesday, February 6, 2024 11:55 PM > > >>>> To: Rahul Bhansali <rbhans...@marvell.com>; dev@dpdk.org; Radu > > >>>> Nicolau <radu.nico...@intel.com>; Akhil Goyal <gak...@marvell.com>; > > >>>> Konstantin Ananyev <konstantin.anan...@huawei.com>; Anoob Joseph > > >>>> <ano...@marvell.com> > > >>>> Subject: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec > > >>>> performance drop > > >>>> > > >>>> External Email > > >>>> > > >>>> ------------------------------------------------------------------- > > >>>> -- > > >>>> - On 2/6/2024 12:38 PM, Rahul Bhansali wrote: > > >>>>> Single packet free using rte_pktmbuf_free_bulk() is dropping the > > >>>>> performance. On cn10k, maximum of ~4% drop observed for IPsec > > >>>>> event mode single SA outbound case. > > >>>>> > > >>>>> To fix this issue, single packet free will use rte_pktmbuf_free API. > > >>>>> > > >>>>> Fixes: bd7c063561b3 ("examples/ipsec-secgw: use bulk free") > > >>>>> > > >>>>> Signed-off-by: Rahul Bhansali <rbhans...@marvell.com> > > >>>>> --- > > >>>>> examples/ipsec-secgw/ipsec-secgw.h | 7 +++---- > > >>>>> 1 file changed, 3 insertions(+), 4 deletions(-) > > >>>>> > > >>>>> diff --git a/examples/ipsec-secgw/ipsec-secgw.h > > >>>>> b/examples/ipsec-secgw/ipsec-secgw.h > > >>>>> index 8baab44ee7..ec33a982df 100644 > > >>>>> --- a/examples/ipsec-secgw/ipsec-secgw.h > > >>>>> +++ b/examples/ipsec-secgw/ipsec-secgw.h > > >>>>> @@ -229,11 +229,10 @@ free_reassembly_fail_pkt(struct rte_mbuf > > >>>>> *mb) } > > >>>>> > > >>>>> /* helper routine to free bulk of packets */ -static inline void > > >>>>> -free_pkts(struct rte_mbuf *mb[], uint32_t n) > > >>>>> +static __rte_always_inline void > > >>>>> +free_pkts(struct rte_mbuf *mb[], const uint32_t n) > > >>>>> { > > >>>>> - rte_pktmbuf_free_bulk(mb, n); > > >>>>> - > > >>>>> + n == 1 ? rte_pktmbuf_free(mb[0]) : rte_pktmbuf_free_bulk(mb, > n); > > >>>>> core_stats_update_drop(n); > > >>>>> } > > >>>>> > > >>>> > > >>>> Hi Rahul, > > >>>> > > >>>> Do you think the 'rte_pktmbuf_free_bulk()' API performance can be > > >>>> improved by similar change? > > >>> > > >>> Hi Ferruh, > > >>> Currently 'rte_pktmbuf_free_bulk() is not inline. If we make that > > >>> along with > > >> __rte_pktmbuf_free_seg_via_array() both inline then performance can > > >> be improved similar. > > >>> > > >> > > >> Ah, so performance improvement is coming from 'rte_pktmbuf_free()' > > >> being inline, OK. > > >> > > >> As you are doing performance testing in that area, can you please > > >> check if '__rte_pktmbuf_free_seg_via_array()' is inlined, as it is > > >> static function I expect it to be inlined. If not, can you please > > >> test with force inlining it (__rte_always_inline)? > > > It was not inline, did check with force inline also and no impact with > > > this, so I > > can make it force inline. > > > > > > > If there is no performance improvement, I think no need to force inline > > '__rte_pktmbuf_free_seg_via_array()'. > > > > >> > > >> > > >> And I wonder if bulk() API may get single mbuf is a common theme, > > >> does it makes sense add a new inline wrapper to library to cover this > > >> case, if it is bringing ~4% improvement, like: > > >> ``` > > >> static inline void > > >> rte_pktmbuf_free_bulk_or_one(... **mb, unsigned int n) { > > >> if (n == 1) > > >> return rte_pktmbuf_free(mb[0]); > > >> return rte_pktmbuf_free_bulk(mb, n); } > > > Agree, can make this wrapper to cover a case where bulk free API is > > > called but might have single mbuf to get better perf. It can be > > > further optimize " if (n == 1)" with compile time constant check, ``` > > > static inline void rte_pktmbuf_free_bulk_or_one(struct rte_mbuf **mb, > > > unsigned int n) { > > > if (__builtin_constant_p(n) && (n == 1)) > > > rte_pktmbuf_free(mb[0]); > > > else > > > rte_pktmbuf_free_bulk(mb, n); } ``` Let me know if it > > > is fine. I'll send v2. And, this will be " __rte_experimental" right ? > > > > > > > Compile time constant check can prevent penalty from additional check, which > is > > good, and I can see this can work for the examples/ipsec-secgw usecase > > above, > > which has some hardcoded single mbuf free calls. > > > > But most of the other usecases I think 'n' won't be known in compile time, > > so > API > > will be effectively same as free_bulk(). > Agree. > > > > If you have it with runtime check, do you still observe any performance > > improvement? If not perhaps we can go only with example code update, > without > > new API. > With runtime check, performance improvement is small only in compare to > compile time check. So can continue without this new API.
Acked-by: Akhil Goyal <gak...@marvell.com> Applied to dpdk-next-crypto Thanks.