[dpdk-dev] [PATCH v6 1/2] mbuf: provide rte_pktmbuf_alloc_bulk API

Ananyev, Konstantin Wed, 24 Feb 2016 13:23:46 +0000

Hi Panu,

> -----Original Message-----
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Panu Matilainen
> Sent: Wednesday, February 24, 2016 12:12 PM
> To: Xie, Huawei; Olivier MATZ; dev at dpdk.org
> Cc: dprovan at bivio.net
> Subject: Re: [dpdk-dev] [PATCH v6 1/2] mbuf: provide rte_pktmbuf_alloc_bulk 
> API
> 
> On 02/23/2016 07:35 AM, Xie, Huawei wrote:
> > On 2/22/2016 10:52 PM, Xie, Huawei wrote:
> >> On 2/4/2016 1:24 AM, Olivier MATZ wrote:
> >>> Hi,
> >>>
> >>> On 01/27/2016 02:56 PM, Panu Matilainen wrote:
> >>>> Since rte_pktmbuf_alloc_bulk() is an inline function, it is not part of
> >>>> the library ABI and should not be listed in the version map.
> >>>>
> >>>> I assume its inline for performance reasons, but then you lose the
> >>>> benefits of dynamic linking such as ability to fix bugs and/or improve
> >>>> itby just updating the library. Since the point of having a bulk API is
> >>>> to improve performance by reducing the number of calls required, does it
> >>>> really have to be inline? As in, have you actually measured the
> >>>> difference between inline and non-inline and decided its worth all the
> >>>> downsides?
> >>> Agree with Panu. It would be interesting to compare the performance
> >>> between inline and non inline to decide whether inlining it or not.
> >> Will update after i gathered more data. inline could show obvious
> >> performance difference in some cases.
> >
> > Panu and Oliver:
> > I write a simple benchmark. This benchmark run 10M rounds, in each round
> > 8 mbufs are allocated through bulk API, and then freed.
> > These are the CPU cycles measured(Intel(R) Xeon(R) CPU E5-2680 0 @
> > 2.70GHz, CPU isolated, timer interrupt disabled, rcu offloaded).
> > Btw, i have removed some exceptional data, the frequency of which is
> > like 1/10. Sometimes observed user usage suddenly disappeared, no clue
> > what happened.
> >
> > With 8 mbufs allocated, there is about 6% performance increase using inline.
> [...]
> >
> > With 16 mbufs allocated, we could still observe obvious performance
> > difference, though only 1%-2%
> >
> [...]
> >
> > With 32/64 mbufs allocated, the deviation of the data itself would hide
> > the performance difference.
> > So we prefer using inline for performance.
> 
> At least I was more after real-world performance in a real-world
> use-case rather than CPU cycles in a microbenchmark, we know function
> calls have a cost but the benefits tend to outweight the cons.
> 
> Inline functions have their place and they're far less evil in project
> internal use, but in library public API they are BAD and should be ...
> well, not banned because there are exceptions to every rule, but highly
> discouraged.


Why is that?
As you can see right now we have all mbuf alloc/free routines as static inline.
And I think we would like to keep it like that.
So why that particular function should be different?
After all that function is nothing more than a wrapper 
around rte_mempool_get_bulk()  unrolled by 4 loop {rte_pktmbuf_reset()}
So unless mempool get/put API would change, I can hardly see there could be any 
ABI
breakages in future. 
About 'real world' performance gain - it was a 'real world' performance problem,
that we tried to solve by introducing that function:
http://dpdk.org/ml/archives/dev/2015-May/017633.html

And according to the user feedback, it does help:  
http://dpdk.org/ml/archives/dev/2016-February/033203.html

Konstantin

> 
>       - Panu -
>

[dpdk-dev] [PATCH v6 1/2] mbuf: provide rte_pktmbuf_alloc_bulk API

Reply via email to