Dear Jacek, Use of the clib memory allocator is mainly historical. It’s elegant in a couple of ways - including built-in leak-finding - but it has been known to backfire in terms of performance. Individual mheaps are limited to 4gb in a [typical] 32-bit vector length image.
Note that the idiosyncratic mheap API functions “tell me how long this object really is” and “allocate N bytes aligned to a boundary at a certain offset” are used all over the place. I wouldn’t mind replacing it - so long as we don’t create a hard dependency on the dpdk - but before we go there...: Tell me a bit about the scenario at hand. What are we repeatedly allocating / freeing? That’s almost never necessary... Can you easily share the offending backtrace? Thanks… Dave From: vpp-dev-boun...@lists.fd.io [mailto:vpp-dev-boun...@lists.fd.io] On Behalf Of Jacek Siuda Sent: Tuesday, September 5, 2017 9:08 AM To: vpp-dev@lists.fd.io Subject: [vpp-dev] mheap performance Hi, I'm conducting a tunnel test using VPP (vnet) map with the following parameters: ea_bits_len=0, psid_offset=16, psid=length, single rule for each domain; total number of tunnels: 300000, total number of control messages: 600k. My problem is with simple adding tunnels. After adding more than ~150k-200k, performance drops significantly: first 100k is added in ~3s (on asynchronous C client), next 100k in another ~5s, but the last 100k takes ~37s to add; in total: ~45s. Python clients are performing even worse: 32 minutes(!) for 300k tunnels with synchronous (blocking) version and ~95s with asynchronous. The python clients are expected to perform a bit worse according to vpp docs, but I was worried by non-linear time of single tunnel addition that is visible even on C client. While investigating this using perf, I found the culprit: it is the memory allocation done for ip address by rule addition request. The memory is allocated by clib, which is using mheap library (~98% of cpu consumption). I looked into mheap and it looks a bit complicated for allocating a short object. I've done a short experiment by replacing (in vnet/map/ only) clib allocation with DPDK rte_malloc() and achieved a way better performance: 300k tunnels in ~5-6s with the same C-client, and respectively ~70s and ~30-40s with Python clients. Also, I haven't noticed any negative impact on packet throughput with my experimental allocator. So, here are my questions: 1) Did someone other reported performance penalties for using mheap library? I've searched the list archive and could not find any related questions. 2) Why mheap library was chosen to be used in clib? Are there any performance benefits in some scenarios? 3) Are there any (long- or short-term) plans to replace memory management in clib with some other library? 4) I wonder, if I'd like to upstream my solution, how should I approach customization of memory allocation, so it would be accepted by community. Installable function pointers defaulting to clib? Best Regards, Jacek Siuda.
_______________________________________________ vpp-dev mailing list vpp-dev@lists.fd.io https://lists.fd.io/mailman/listinfo/vpp-dev