Here is a simple test case: " uint64_t entry_time, time; size_t size = 4096; unsigned align = 4096; for (int j = 0; j < 10; j++) { entry_time = rte_get_timer_cycles(); for (int i = 0; i < 2000; i++) { rte_malloc(NULL, size, align); } time = (rte_get_timer_cycles()-entry_time) * 1000000 / rte_get_timer_hz(); printf("total open time %lu avg time %lu\n", time, time/2000); } "
Single rte_malloc cost time may becomes wrose as the number of malloc increases, In my env, first round avg time is 15us, second is 44us, third is 77us, fourth is 168us... The reason is,in the malloc process, malloc_elem_alloc may split new_elem if there have too much free space after new_elem, and insert the trailer into freelist. When alloc 4k with align 4k, the trailer very likely insert to free_head[2] again, it makes free_head[2] longer. when alloc 4k again, it will search free_head[2] from begin, with the number of malloc increases, search free_head[2] need more time, so the performance will become worse. Same problem will also occurs in alloc 64k with align 64k, but if alloc 4k with align 64, doesn't have this problem. Fix this by adjust free_head list size range, make free_head[3] hold elements which bigger or equal 4k, free_head[4] hold elements which bigger or equal 16k. In terms of probabilities, when alloc 4k or 16k, the probability of finding a suitable elem from a larger size list is greater than from a smaller size list. Signed-off-by: Fengnan Chang <changfeng...@bytedance.com> --- lib/eal/common/malloc_elem.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/lib/eal/common/malloc_elem.c b/lib/eal/common/malloc_elem.c index 83f05497cc..35a2313d04 100644 --- a/lib/eal/common/malloc_elem.c +++ b/lib/eal/common/malloc_elem.c @@ -367,11 +367,11 @@ prev_elem_is_adjacent(struct malloc_elem *elem) * containing larger elements. * * Example element size ranges for a heap with five free lists: - * heap->free_head[0] - (0 , 2^8] - * heap->free_head[1] - (2^8 , 2^10] - * heap->free_head[2] - (2^10 ,2^12] - * heap->free_head[3] - (2^12, 2^14] - * heap->free_head[4] - (2^14, MAX_SIZE] + * heap->free_head[0] - (0 , 2^8) + * heap->free_head[1] - [2^8 , 2^10) + * heap->free_head[2] - [2^10 ,2^12) + * heap->free_head[3] - [2^12, 2^14) + * heap->free_head[4] - [2^14, MAX_SIZE] */ size_t malloc_elem_free_list_index(size_t size) @@ -385,8 +385,8 @@ malloc_elem_free_list_index(size_t size) if (size <= (1UL << MALLOC_MINSIZE_LOG2)) return 0; - /* Find next power of 2 >= size. */ - log2 = sizeof(size) * 8 - __builtin_clzl(size - 1); + /* Find next power of 2 > size. */ + log2 = sizeof(size) * 8 - __builtin_clzl(size); /* Compute freelist index, based on log2(size). */ index = (log2 - MALLOC_MINSIZE_LOG2 + MALLOC_LOG2_INCREMENT - 1) / -- 2.37.0 (Apple Git-136)