Il 28 ago 2017 11:43 PM, "Pranith Kumar" <bobby.pr...@gmail.com> ha scritto:
On Mon, Aug 28, 2017 at 1:47 PM, Richard Henderson <richard.hender...@linaro.org> wrote: > On 08/27/2017 08:53 PM, Pranith Kumar wrote: >> Using heaptrack, I found that quite a few of our temporary allocations >> are coming from allocating work items. Instead of doing this >> continously, we can cache the allocated items and reuse them instead >> of freeing them. >> >> This reduces the number of allocations by 25% (200000 -> 150000 for >> ARM64 boot+shutdown test). >> >> Signed-off-by: Pranith Kumar <bobby.pr...@gmail.com> > > Why does this list need to record a "last" element? > It would seem a simple lifo would be sufficient. > > (You would also be able to manage the list via cmpxchg without a separate lock, > but perhaps the difference between the two isn't measurable.) > Yes, seems like a better design choice. Will fix in next iteration. More recent glibc will also have an efficient per-thread allocator, and though I haven't yet benchmarked the newer glibc malloc, GSlice is slower than at least both tcmalloc and jemalloc. Perhaps you could instead make work items statically allocated? Thanks, Paolo Thanks, -- Pranith