Hi Panu, > -----Original Message----- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Panu Matilainen > Sent: Wednesday, February 24, 2016 12:12 PM > To: Xie, Huawei; Olivier MATZ; dev at dpdk.org > Cc: dprovan at bivio.net > Subject: Re: [dpdk-dev] [PATCH v6 1/2] mbuf: provide rte_pktmbuf_alloc_bulk > API > > On 02/23/2016 07:35 AM, Xie, Huawei wrote: > > On 2/22/2016 10:52 PM, Xie, Huawei wrote: > >> On 2/4/2016 1:24 AM, Olivier MATZ wrote: > >>> Hi, > >>> > >>> On 01/27/2016 02:56 PM, Panu Matilainen wrote: > >>>> Since rte_pktmbuf_alloc_bulk() is an inline function, it is not part of > >>>> the library ABI and should not be listed in the version map. > >>>> > >>>> I assume its inline for performance reasons, but then you lose the > >>>> benefits of dynamic linking such as ability to fix bugs and/or improve > >>>> itby just updating the library. Since the point of having a bulk API is > >>>> to improve performance by reducing the number of calls required, does it > >>>> really have to be inline? As in, have you actually measured the > >>>> difference between inline and non-inline and decided its worth all the > >>>> downsides? > >>> Agree with Panu. It would be interesting to compare the performance > >>> between inline and non inline to decide whether inlining it or not. > >> Will update after i gathered more data. inline could show obvious > >> performance difference in some cases. > > > > Panu and Oliver: > > I write a simple benchmark. This benchmark run 10M rounds, in each round > > 8 mbufs are allocated through bulk API, and then freed. > > These are the CPU cycles measured(Intel(R) Xeon(R) CPU E5-2680 0 @ > > 2.70GHz, CPU isolated, timer interrupt disabled, rcu offloaded). > > Btw, i have removed some exceptional data, the frequency of which is > > like 1/10. Sometimes observed user usage suddenly disappeared, no clue > > what happened. > > > > With 8 mbufs allocated, there is about 6% performance increase using inline. > [...] > > > > With 16 mbufs allocated, we could still observe obvious performance > > difference, though only 1%-2% > > > [...] > > > > With 32/64 mbufs allocated, the deviation of the data itself would hide > > the performance difference. > > So we prefer using inline for performance. > > At least I was more after real-world performance in a real-world > use-case rather than CPU cycles in a microbenchmark, we know function > calls have a cost but the benefits tend to outweight the cons. > > Inline functions have their place and they're far less evil in project > internal use, but in library public API they are BAD and should be ... > well, not banned because there are exceptions to every rule, but highly > discouraged.
Why is that? As you can see right now we have all mbuf alloc/free routines as static inline. And I think we would like to keep it like that. So why that particular function should be different? After all that function is nothing more than a wrapper around rte_mempool_get_bulk() unrolled by 4 loop {rte_pktmbuf_reset()} So unless mempool get/put API would change, I can hardly see there could be any ABI breakages in future. About 'real world' performance gain - it was a 'real world' performance problem, that we tried to solve by introducing that function: http://dpdk.org/ml/archives/dev/2015-May/017633.html And according to the user feedback, it does help: http://dpdk.org/ml/archives/dev/2016-February/033203.html Konstantin > > - Panu - >