Hi everybody, I created a fix for it which will hit the mailing list soon, but considered it important to send this mail ahead. All that analysis has no place in the patch description, but it helps to understand why/what was going on. The follow up patch will have title "lpm/lpm6: fix missing free of rules_tbl and lpm"
I ran into issues with the lpm6 autotest failing for me. Looking at it I saw all kind of these: Error at line 679: ERROR: LPM Test tests6[i]: FAIL LPM: LPM memory allocation failed [...] It turned out that 2500M memory would have been enough, but that couldn't be the solution With some debugging eventually it boiled down to find_suitable_element(heap, size, flags, align, bound) not finding any space. While for the same sized allocation before it did find it. Note: Along the way I found the use after free I submitted a patch this morning. I expected a leak, but valgrind wasn't too helpful, but then that was expected as I guess that would be more an internal leak/fragmentation in the structures than a real leak. Thinking of a leak / fragmentation I have broken up the loop in test_lpm6 and ran them in segments: - 1-end: failing at 13 and following as reported - 13-end: working - skipping some ... (you get the idea) A bit like bisecting :-) It turned out that idx 2 (=> test2) was very important, but not the only source of the issue. This particular test does iterative allocation and free with slightly changed config (a bit smaller) each time. It always failed at the 22nd allocation via rte_lpm6_create and all later ones failed. It really just is this innerloop: for (i = 0; i < 100; i++) { config.max_rules = MAX_RULES - 100 + i; printf("INFO: %s - allocating for %d rules (%d/100)\n", __func__, config.max_rules, i); lpm = rte_lpm6_create(__func__, SOCKET_ID_ANY, &config); TEST_LPM_ASSERT(lpm != NULL); rte_lpm6_free(lpm); } But while we see "LPM: LPM memory allocation failed" the following assertion doesn't trigger. NOTE: that is what was fixed by my patch this morning. The failing alloc is for the rules tables: rte_zmalloc_socket -> rte_malloc_socket -> malloc_heap_alloc -> find_suitable_element with sizes usually at or close to "18000000". That is ~17MB, as it fails at alloc 22 with a leak that would be ~374M for these alone. So as a ballpark estimation a leak or a fragmenting consumption makes sense to assume. Reporting heap->alloc_count in find_suitable_element proved that it was exhausting the pool. Once can see that the alloc_count is always increasing. Then I realized that while the assignment that eventually fails is this: lpm->rules_tbl = (struct rte_lpm6_rule *)rte_zmalloc_socket(NULL,(size_t)rules_size, RTE_CACHE_LINE_SIZE, socket_id); There is no free for that pointer ever grep -Hrn -C 3 rules_tbl * | grep free So I found in rte_lpm6_free that - lpm might not be freed if it didn't find a te - lpm->rules_tbl was not freed ever As I said a patch will follow soon. Christian Ehrhardt Software Engineer, Ubuntu Server Canonical Ltd