On Fri, Feb 10, 2023 at 7:30 AM Fengnan Chang <changfeng...@bytedance.com> wrote: > > Here is a simple test case: > " > uint64_t entry_time, time; > size_t size = 4096; > unsigned align = 4096; > for (int j = 0; j < 10; j++) { > entry_time = rte_get_timer_cycles(); > for (int i = 0; i < 2000; i++) { > rte_malloc(NULL, size, align); > } > time = (rte_get_timer_cycles()-entry_time) * 1000000 / > rte_get_timer_hz(); > printf("total open time %lu avg time %lu\n", time, time/2000); > } > " > > Single rte_malloc cost time may becomes wrose as the number of malloc > increases, In my env, first round avg time is 15us, second is 44us, > third is 77us, fourth is 168us... > > The reason is,in the malloc process, malloc_elem_alloc may split new_elem > if there have too much free space after new_elem, and insert the trailer > into freelist. When alloc 4k with align 4k, the trailer very likely insert > to free_head[2] again, it makes free_head[2] longer. when alloc 4k again, > it will search free_head[2] from begin, with the number of malloc increases, > search free_head[2] need more time, so the performance will become worse. > Same problem will also occurs in alloc 64k with align 64k, but if alloc > 4k with align 64, doesn't have this problem. > > Fix this by adjust free_head list size range, make free_head[3] hold > elements which bigger or equal 4k, free_head[4] hold elements which bigger > or equal 16k. > In terms of probabilities, when alloc 4k or 16k, the probability of finding > a suitable elem from a larger size list is greater than from a smaller > size list. > > Signed-off-by: Fengnan Chang <changfeng...@bytedance.com> Acked-by: Morten Brørup <m...@smartsharesystems.com>
The change looks simple enough. I see an improvement with the (verbatim) malloc_perf_autotest unit tests for 1M allocations too. Let's take this change now and see how it goes with -rc1 testing. Applied, thanks. -- David Marchand