On Tue, Nov 12, 2013 at 01:41:24PM +0100, Marc Glisse wrote:
> On Tue, 12 Nov 2013, Ondřej Bílka wrote:
> 
> >>I am trying to get something to actually work and be accepted in
> >>gcc. That may mean being conservative.
> >
> >That also may mean that you will cover only cases where it is not needed.
> >
> >A malloc will have a small per-thread cache for small requests that does
> >not need any locking. A performance difference will be quite small and
> >there may be a define which causes inlining constant size mallocs.
> >
> >Sizes from 256 bytes are interesting case.
> 
> I have to disagree here. When the allocated size is large enough,
> the cost of malloc+free often becomes small compared to whatever
> work you are doing in that array. It is when the size is very small
> that speeding up malloc+free is essential. And you are
> underestimating the cost of those small allocations.
>
No, just aware that these are important and there will be optimizations
that convert these. For example:

#define malloc (s) ({ \
  static pool p; \
  if (__builtin_constant_p (s) { \
    alloc_from_pool(&p); \
  else \
    malloc (s); \
})

How will you find small constant allocations with this in place?

Also you malloc has tradeoff between time and memory usage
and where larger allocations needlessly cause bigger fragmentation.
 
> I started on this because of an application that spends more than
> half of its time in malloc+free and where (almost) no allocation is
> larger than 100 bytes. Changing the code to not use malloc/free but
> other allocation strategies is very complicated because it would
> break abstraction layers. I used various workarounds that proved
> rather effective, but I would have loved for that to be unnecessary.
> 
See my memory pool that uses custom free functionality where you need
only change malloc, free is handled automaticaly.

> >
> >And what is logic of limiting sizes?
> 
> Main stack is rather small by default. And it is nice if other
> variables on the stack are not spread over too many memory pages.

No, question was what is logic that you will use to decide if allocation
is stack or heap?

Also if we take not spreading stack argument to logical conclusion a
best course of action in adding a separate stack and placing arrays/alloca
over 1024 bytes there.

This is theory, I tried few benchmarks how using large alloca on
different stack affects performance but I did not measure a difference.

> 
> >Note that a leaf function have higher priority for stack that
> >nonleaf ones as when you do stack allocation early it may kill lot
> >of leaf allocations because of stack concerns.
> 
> Sorry, I have trouble understanding your sentence. Do you just mean
> that if there is a threshold it should be higher in leaf functions?
> 
Yes, functions closer to leaf function should have higher tresholds.

Generally in leaf functions you can safely use 65536 bytes as most libc
function assume this stack so caller would get same problems by calling
that function.

> >>And using a decl instead of alloca means that even if
> >>malloc+free was in a loop, not deallocating won't make me grow the
> >>stack use linearly in the number of iterations of the loop.
> >>
> >Actually this looks like a orthogonal optimalization of memory reuse,
> 
> Replacing malloc+free with alloca (without save/restore of the
> stack) would be a pessimization.
> 
I did not mentioned alloca.

> I understand the topic of optimizing allocations is very large, and
> powerful techniques could be developed for it. However, those bugs
> have been open for years and nobody has bothered going further that
> removing free(malloc(n)). So I think we should go with the small
> case that has known usecases. And when you come up with a more
> general framework that makes this small case irrelevant, I will
> gladly welcome it and we can easily remove the small case.
> 
As this is hard in gcc then doing these without gcc should be easier.

In several cases (constant size) we do not need gcc involvement as we
colud get information ourself. In other cases a best behaviour could be
determined only by running a custom benchmarking program which will tell
which variant should go to which malloc.

Then there are parts where coordination is necessary, one is determining
if stack allocation is possible. A posible way would be first turn a
eligible malloc calls to

malloc_stack(size, color)

as hint to allocator. I added a color parameter to handle partial
overlap, if you do a coloring with edge when allocations partialy
overlap then you can assign to each color class a stack and proceed as
normal.

Reply via email to