On 2/9/2024 1:10 PM, Rahul Bhansali wrote:
> 
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yi...@amd.com>
>> Sent: Wednesday, February 7, 2024 4:06 PM
>> To: Rahul Bhansali <rbhans...@marvell.com>; dev@dpdk.org; Radu Nicolau
>> <radu.nico...@intel.com>; Akhil Goyal <gak...@marvell.com>; Konstantin
>> Ananyev <konstantin.anan...@huawei.com>; Anoob Joseph
>> <ano...@marvell.com>
>> Subject: Re: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec performance 
>> drop
>>
>> On 2/7/2024 6:46 AM, Rahul Bhansali wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Ferruh Yigit <ferruh.yi...@amd.com>
>>>> Sent: Tuesday, February 6, 2024 11:55 PM
>>>> To: Rahul Bhansali <rbhans...@marvell.com>; dev@dpdk.org; Radu
>>>> Nicolau <radu.nico...@intel.com>; Akhil Goyal <gak...@marvell.com>;
>>>> Konstantin Ananyev <konstantin.anan...@huawei.com>; Anoob Joseph
>>>> <ano...@marvell.com>
>>>> Subject: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec
>>>> performance drop
>>>>
>>>> External Email
>>>>
>>>> ---------------------------------------------------------------------
>>>> - On 2/6/2024 12:38 PM, Rahul Bhansali wrote:
>>>>> Single packet free using rte_pktmbuf_free_bulk() is dropping the
>>>>> performance. On cn10k, maximum of ~4% drop observed for IPsec event
>>>>> mode single SA outbound case.
>>>>>
>>>>> To fix this issue, single packet free will use rte_pktmbuf_free API.
>>>>>
>>>>> Fixes: bd7c063561b3 ("examples/ipsec-secgw: use bulk free")
>>>>>
>>>>> Signed-off-by: Rahul Bhansali <rbhans...@marvell.com>
>>>>> ---
>>>>>  examples/ipsec-secgw/ipsec-secgw.h | 7 +++----
>>>>>  1 file changed, 3 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/examples/ipsec-secgw/ipsec-secgw.h
>>>>> b/examples/ipsec-secgw/ipsec-secgw.h
>>>>> index 8baab44ee7..ec33a982df 100644
>>>>> --- a/examples/ipsec-secgw/ipsec-secgw.h
>>>>> +++ b/examples/ipsec-secgw/ipsec-secgw.h
>>>>> @@ -229,11 +229,10 @@ free_reassembly_fail_pkt(struct rte_mbuf *mb)
>>>>> }
>>>>>
>>>>>  /* helper routine to free bulk of packets */ -static inline void
>>>>> -free_pkts(struct rte_mbuf *mb[], uint32_t n)
>>>>> +static __rte_always_inline void
>>>>> +free_pkts(struct rte_mbuf *mb[], const uint32_t n)
>>>>>  {
>>>>> - rte_pktmbuf_free_bulk(mb, n);
>>>>> -
>>>>> + n == 1 ? rte_pktmbuf_free(mb[0]) : rte_pktmbuf_free_bulk(mb, n);
>>>>>   core_stats_update_drop(n);
>>>>>  }
>>>>>
>>>>
>>>> Hi Rahul,
>>>>
>>>> Do you think the 'rte_pktmbuf_free_bulk()' API performance can be
>>>> improved by similar change?
>>>
>>> Hi Ferruh,
>>> Currently 'rte_pktmbuf_free_bulk() is not inline. If we make that along with
>> __rte_pktmbuf_free_seg_via_array()  both inline then performance can be
>> improved similar.
>>>
>>
>> Ah, so performance improvement is coming from 'rte_pktmbuf_free()' being
>> inline, OK.
>>
>> As you are doing performance testing in that area, can you please check if
>> '__rte_pktmbuf_free_seg_via_array()' is inlined, as it is static function I 
>> expect it
>> to be inlined. If not, can you please test with force inlining it
>> (__rte_always_inline)?
> It was not inline, did check with force inline also and no impact with this, 
> so I can make it force inline.
>

If there is no performance improvement, I think no need to force inline
'__rte_pktmbuf_free_seg_via_array()'.

>>
>>
>> And I wonder if bulk() API may get single mbuf is a common theme, does it 
>> makes
>> sense add a new inline wrapper to library to cover this case, if it is 
>> bringing ~4%
>> improvement, like:
>> ```
>> static inline void
>> rte_pktmbuf_free_bulk_or_one(... **mb, unsigned int n) {
>>      if (n == 1)
>>              return rte_pktmbuf_free(mb[0]);
>>      return rte_pktmbuf_free_bulk(mb, n);
>> }
> Agree, can make this wrapper to cover a case where bulk free API is called 
> but might have single mbuf to get better perf. It can be further optimize " 
> if (n == 1)" with compile time constant check,
> ```
> static inline void
> rte_pktmbuf_free_bulk_or_one(struct rte_mbuf **mb, unsigned int n)
> {
>        if (__builtin_constant_p(n) && (n == 1))
>                rte_pktmbuf_free(mb[0]);
>        else
>                rte_pktmbuf_free_bulk(mb, n);
> }
> ```
> Let me know if it is fine. I'll send v2. And, this will be " 
> __rte_experimental" right ?
>

Compile time constant check can prevent penalty from additional check,
which is good, and I can see this can work for the examples/ipsec-secgw
usecase above, which has some hardcoded single mbuf free calls.

But most of the other usecases I think 'n' won't be known in compile
time, so API will be effectively same as free_bulk().

If you have it with runtime check, do you still observe any performance
improvement? If not perhaps we can go only with example code update,
without new API.

Reply via email to