Hi, +1 based on the benchmark results.
Questions: 1. Do we need to keep jemalloc support? Compatibility? Can we drop support for jemalloc to decrease maintenance cost? 2. Is it OK that we add support for system mimalloc? We always use vendored mimalloc for now: https://github.com/apache/arrow/blob/399408cb273c47f490f65cdad95bc184a652826c/cpp/cmake_modules/ThirdpartyToolchain.cmake#L2197-L2251 FYI: In general, I want to use system libraries as much as possible. But we can't use system jemalloc for bindings because most system jemalloc don't support dlopen(): https://github.com/apache/arrow/issues/32530 If mimalloc doesn't have the restriction, mimalloc is better for me. Thanks, -- kou In <c6e2b73e-b343-490e-b6a7-3d1379519...@python.org> "[Discuss][C++] Switch to mimalloc by default?" on Wed, 5 Jun 2024 17:18:36 +0200, Antoine Pitrou <anto...@python.org> wrote: > > Hello, > > Arrow C++ features a MemoryPool abstraction that allows using > different allocators interchangeably. Several MemoryPool > implementations are provided with Arrow C++ (though one can also build > their own): > > - a jemalloc-based implementation, currently the default on Linux > - a mimalloc-based implementation, currently the default on macOS and > - Windows > - an implementation that defers to the system's standard allocator > - (using the malloc() and free() calls), available as a fallback and for > - experimentation > > While jemalloc is the current default on Linux, our continuous > benchmarking infrastructure actually enables mimalloc > instead. Therefore, I've made a draft PR that switches our > benchmarking to jemalloc, so as to measure any concrete differences > between the two: > https://github.com/apache/arrow/pull/41205 > > The results show that there is a large number of performance drops > with large effect sizes on the C++ microbenchmarks. There is also a > smaller number of C++ microbenchmarks with improved performance > results. A summary report with links to detailed results can be found > here: > https://github.com/apache/arrow/runs/25745674261 > > > With this in mind, I would like to propose that we switch the default > to mimalloc for all platforms. This would have several desirable > effects: > > - less variability between platforms > - mimalloc generally has a nicer, more consistent API and is easier to > - work with (in particular, jemalloc's configuration scheme is slightly > - abtruse) > - potentially better performance, or at least not significantly worse, > - than the statu quo > > We would have to keep at least one CI job with jemalloc enabled, to > make sure we're not regressing in that regard. > > What do you think? > > Regards > > Antoine.