[EMAIL PROTECTED] wrote:
> >Offtopic: Which still reminds me to write an email about that the
> >Solaris kernel is very very malloc()-happy (which is unneccesary in many
> >cases now that C99-sematics are allowed).
> 
> You are wrong.

Thank you... ;-((
See below...

> We *malloc* not because of shortcomings in C but because
> we don't have any room on the stack to put stuff.

I know that. And I did already some research on that (and the only
reason I didn't work on finishing the prototype patch is that I have to
work for food on other, non-Solaris related projects - which means I
only have 1-2 hours per day to do the interesting bits. And those are
currently spend on the ksh93-integration project...) which provides
multiple options, including:

- Increase stack size if possible (for example using 64k pages on SPARC
(excluding UltraSPARC-3 and older versions of SPARC64), maybe limiting
this to 64bit threads (this has several compliciated reasons))

- Create a C macro |#define KMEM_TMP_ALLOC()| which expands to the
following procedure:
1. Measure stack size and current available space on stack. The first
512bytes (of a 8k default stack, a 64k default stack would offer much
more room (this value is a tuneable, setting it to |0| will disable the
stack allocations)) are available to allocations via C99 constructs (if
this fails the size will simply be set to |0|).

2. If [1] couldn't allocate space a special "temporary space" allocator
is called which fetches memory from per-process preallocated 4M pages
(allocated from processor-local memory and split into 8 (where |8| is a
tuneable, too) slices with seperate mutexes (slice index is simply a
hash over the thread id, which means that the same thread usually asks
for the same slice)). Temporary space is usually (with exceptions per
flags passed to |KMEM_TMP_ALLOC()|) allocated in a linear manner so a
different allocator algorithm will be used here (controllable via flags
and stealing some of the ideas in AmigaOS (no flamewar, please) :-) ).
Additional benefits are that accesses are (usually) from local memory
and that - by default - 4M pages are used (reducing the tax on the TLB).

3. If [2] fails (for example no space left in per-processor temp. memory
space or the request size exceeds the maximum slice size (or when flags
want us to skip [2])) we fall-back to the normal allocator (and no, I
don't want to add a 3rd-level "scratchspace" like used in SUPER-UX (erm,
Ok... NUMA machines may benefit from that... but then I better rely on
the normal kernel allocator to deal with that...)).

I know the normal allocator is fast and that it is heavily optimizsed
for scalabilty. But it's even better if such calls or even any function
calls for temporary memory could be avoided (note that many of the
allocations are far below 128 bytes - sometimes the amount of bytes
shuffeled around by function calls&co. are much bigger than the
allocation itself, making this a little bit ridicoulous... ;-( ).

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) [EMAIL PROTECTED]
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 7950090
 (;O/ \/ \O;)
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to