On Fri, Feb 10, 2023 at 7:30 AM Fengnan Chang
<changfeng...@bytedance.com> wrote:
>
> Here is a simple test case:
> "
> uint64_t entry_time, time;
> size_t size = 4096;
> unsigned align = 4096;
> for (int j = 0; j < 10; j++) {
>         entry_time = rte_get_timer_cycles();
>         for (int i = 0; i < 2000; i++) {
>                 rte_malloc(NULL, size, align);
>         }
>         time = (rte_get_timer_cycles()-entry_time) * 1000000 /
>                 rte_get_timer_hz();
>         printf("total open time %lu avg time %lu\n", time, time/2000);
> }
> "
>
> Single rte_malloc cost time may becomes wrose as the number of malloc
> increases, In my env, first round avg time is 15us, second is 44us,
> third is 77us, fourth is 168us...
>
> The reason is,in the malloc process, malloc_elem_alloc may split new_elem
> if there have too much free space after new_elem, and insert the trailer
> into freelist. When alloc 4k with align 4k, the trailer very likely insert
> to free_head[2] again, it makes free_head[2] longer. when alloc 4k again,
> it will search free_head[2] from begin, with the number of malloc increases,
> search free_head[2] need more time, so the performance will become worse.
> Same problem will also occurs in alloc 64k with align 64k, but if alloc
> 4k with align 64, doesn't have this problem.
>
> Fix this by adjust free_head list size range, make free_head[3] hold
> elements which bigger or equal 4k, free_head[4] hold elements which bigger
> or equal 16k.
> In terms of probabilities, when alloc 4k or 16k, the probability of finding
> a suitable elem from a larger size list is greater than from a smaller
> size list.
>
> Signed-off-by: Fengnan Chang <changfeng...@bytedance.com>
Acked-by: Morten Brørup <m...@smartsharesystems.com>

The change looks simple enough.
I see an improvement with the (verbatim) malloc_perf_autotest unit
tests for 1M allocations too.

Let's take this change now and see how it goes with -rc1 testing.


Applied, thanks.

-- 
David Marchand

Reply via email to