On 10 Jun 2025, at 16:22, Minsoo Choo <minsoochoo0...@proton.me> wrote: > > snmalloc by Microsoft - David's suggestion. Yet mimalloc also provides some > security features ("guard pages, randomized allocation, encrypted free lists, > etc. to protect against various heap vulnerabilities") as well. Still in 0.x > stage. > > snmalloc and mimalloc look great for me, but my only concern is that snmalloc > is still in 0.x stage.
It’s being used in production in multiple quite different use cases. The main reason it isn’t 1.0 is that we may change some of the internal APIs. This doesn’t matter for use as a global allocator, because those APIs are stable (defined by C, C++, or Rust, not things that we can change). snmalloc also has a number of security features (the bounds checking alone addresses 10% of critical CVEs in the data set that we looked at and probably has a bigger security impact than anything else). The hardening things are discussed here: https://github.com/microsoft/snmalloc/tree/main/docs/security It’s also the allocator that we’ve used with a lot of the CHERI work, so when the CheriBSD things are upstreamed it’s easy to adopt. Supporting CHERI with mimalloc would be quite tricky. On 10 Jun 2025, at 13:47, Poul-Henning Kamp <p...@phk.freebsd.dk> wrote: > > David Chisnall writes: > >> I've replaced jemalloc with snmalloc (to which I am a contributor) > > It looks interesting, but rather than all of us having to do our > own homework, can you give us the "elevator pitch" for it ? Beyond the things I wrote in my previous mail: Snmalloc is a sizeclass allocator designed for concurrent workloads. All allocations are preformed by a thread-local allocator, so there is no locking required on the fast path for allocation. Frees from the same thread are handled locally. Frees from remote threads are batched (by default, up to 1 MiB of freed objects) and then sent back to the original allocator using a lightweight message queue (single atomic exchange to send a set of allocations, forward-progress guarantees). This means that allocating on one thread and freeing on another (which is the pathological case for thread-caching allocators, but a common thing on multithreaded codebases) is really fast. The platform abstractions are cleanly separated (we try very hard to avoid #ifdef) in most of the code. Here is the FreeBSD platform layer: https://github.com/microsoft/snmalloc/blob/main/src/snmalloc/pal/pal_freebsd.h Architecture abstractions are *mostly* orthogonal to platforms, and snmalloc supports all of the architectures that FreeBSD supports, here’s the full set (note: CHERI is a mixin rather than a separate architecture): https://github.com/microsoft/snmalloc/tree/main/src/snmalloc/aal For producer-consumer workloads, we’ve seen speedups of 50% relative to thread-caching allocators like jemalloc. For single-threaded workloads or workloads where allocation and deallocation happen on the same thread, we’ve seen smaller speedups. David