On 10 Jun 2025, at 16:22, Minsoo Choo <minsoochoo0...@proton.me> wrote:
> 
> snmalloc by Microsoft - David's suggestion. Yet mimalloc also provides some 
> security features ("guard pages, randomized allocation, encrypted free lists, 
> etc. to protect against various heap vulnerabilities") as well. Still in 0.x 
> stage.
> 
> snmalloc and mimalloc look great for me, but my only concern is that snmalloc 
> is still in 0.x stage.

It’s being used in production in multiple quite different use cases.  The main 
reason it isn’t 1.0 is that we may change some of the internal APIs.  This 
doesn’t matter for use as a global allocator, because those APIs are stable 
(defined by C, C++, or Rust, not things that we can change).

snmalloc also has a number of security features (the bounds checking alone 
addresses 10% of critical CVEs in the data set that we looked at and probably 
has a bigger security impact than anything else).  The hardening things are 
discussed here:

https://github.com/microsoft/snmalloc/tree/main/docs/security

It’s also the allocator that we’ve used with a lot of the CHERI work, so when 
the CheriBSD things are upstreamed it’s easy to adopt.  Supporting CHERI with 
mimalloc would be quite tricky.

On 10 Jun 2025, at 13:47, Poul-Henning Kamp <p...@phk.freebsd.dk> wrote:
> 
> David Chisnall writes:
> 
>> I've replaced jemalloc with snmalloc (to which I am a contributor)
> 
> It looks interesting, but rather than all of us having to do our
> own homework, can you give us the "elevator pitch" for it ?

Beyond the things I wrote in my previous mail:

Snmalloc is a sizeclass allocator designed for concurrent workloads.  All 
allocations are preformed by a thread-local allocator, so there is no locking 
required on the fast path for allocation.  Frees from the same thread are 
handled locally.  Frees from remote threads are batched (by default, up to 1 
MiB of freed objects) and then sent back to the original allocator using a 
lightweight message queue (single atomic exchange to send a set of allocations, 
forward-progress guarantees).  This means that allocating on one thread and 
freeing on another (which is the pathological case for thread-caching 
allocators, but a common thing on multithreaded codebases) is really fast.

The platform abstractions are cleanly separated (we try very hard to avoid 
#ifdef) in most of the code.  Here is the FreeBSD platform layer:

https://github.com/microsoft/snmalloc/blob/main/src/snmalloc/pal/pal_freebsd.h

Architecture abstractions are *mostly* orthogonal to platforms, and snmalloc 
supports all of the architectures that FreeBSD supports, here’s the full set 
(note: CHERI is a mixin rather than a separate architecture):

https://github.com/microsoft/snmalloc/tree/main/src/snmalloc/aal

For producer-consumer workloads, we’ve seen speedups of 50% relative to 
thread-caching allocators like jemalloc.  For single-threaded workloads or 
workloads where allocation and deallocation happen on the same thread, we’ve 
seen smaller speedups.

David


Reply via email to