I'm writing my own memory allocator here and have come up with a very
simple design.  I'm wondering if there's anything useful I've missed; I
know the basic ideas behind OpenBSD and glibc allocators but I don't
have their architecture in front of me.

The goal is to do a minimal allocator (i.e. very little code) that gives
good security, performance, and resource management (I hate things
bloating to 3 times bigger than needed).

Mine's got a few features:

 - It doesn't detect double-free()s like glibc; but it detects invalid
   free() addresses, including ones that were previously run through
   free().

 - It detects heap corruption via XOR canaries and a djb2 hash
   (allocation control -> djb2 -> has ^ canary -> xcanary).

 - I have decided on a micro-tier design with high aligned macro-tier
   allocations; simply put, allocations wasting less than 3.125% of
   physical memory will be done via mmap() and placed at the highest
   address in the segment such that a read or write 1 byte beyond the
   end of the allocation will cause a segfault

   + I may simulate this by allocating 1 excess page and marking it
     PROT_NONE since I do not know if Linux intentionally guard-pages
     allocations (I'm not on OpenBSD); this would also be rather cross
     platform

 - Micro-tiers are subject to Allocation Gravity.  Micro-tier
   allocations go into 256KiB mmap() segments.  The most full segment
   that an allocation can be placed in is always used.  This prevents
   mostly empty segments from getting new allocations, encouraging their
   death (once the last allocation is freed from a segment, the segment
   is munmap()ed), preventing the allocation of MORE segments, and
   overall getting more memory back to the system.

 - A primitive thread design is in place which isolates locking
   conditions for thread allocations as much as simply possible.  This
   basically involves each thread working on what is effectively a
   separate allocator, with a bridge between them for free() and
   realloc() to search through.

   + When a thread terminates, if it still has local allocations, its
     thread group is orphaned.  When a thread creates an allocation
     without having a thread group, it uses an available orphan if
     possible.

   + If there are a lot of orphans it may be advantageous to combine
     them into the oldest un-orphaned thread such that the different
     Micro-tiers will not individually get allocations, thus supporting
     eventual death of spare Micro-tiers.  Orphan patterns are trivial
     to track; this won't cause any problem.

I'm also considering other features:

 - The simple addition of prescribble/postscribble option.
   Prescribbling would fill memory with 0xaa when allocated; while
   postscribbling would fill it with 0x55 after free().  This would
   cause failure when making bad assumptions about freshly malloc()'d
   memory or when accessing free()d segments.

 - Post-xcanary.  Enforce allocations to be considered of length
   (control_data + data + xcanary), such that the xcanary in the
   control_data and the xcanary after the last byte would be the same.
   Writes one beyond the end of micro-tier allocated pages would destroy
   the canary and get detected next time a malloc(), realloc(), or
   free() interacted with that allocation.

The design I've chosen is a micro-tier design based on glibc's handling
of large allocations.  glibc allocates 128K or larger allocations using
mmap(); 128KiB+1B wastes 4095 bytes (on 4K page systems), roughly 1/32
of the allocated memory.  As long as only 1/32 of the physical memory is
wasted, allocating with mmap() should be sane.  Note that the number
chosen is rather arbitrary and can be tightened or loosened to taste;
OpenBSD for example would be this design at 1/2 (anything over a page
gets mmap() in OBSD, and can only waste 50%).

The inspiration from OpenBSD for guard pages leads me to the
high-aligned allocations in macro-tier pages, allowing me to
artificially create segfaults where normal allocators wouldn't be able
to enforce proper behavior.  Failing this immediate retribution in
micro-tier allocations, which pack multiple allocations into a 256K
mmap() segment, the post-xcanary idea will probably allow for detection
of off-by-one at a later date.

Did I miss anything important?

--
John Moser <[EMAIL PROTECTED]>

[demime 1.01d removed an attachment of type application/pgp-signature which had 
a name of signature.asc]

Reply via email to