On Tue, Nov 12, 2013 at 01:16:14AM +0100, Marc Glisse wrote:
> On Mon, 11 Nov 2013, Ondřej Bílka wrote:
> 
> >On Sun, Nov 10, 2013 at 04:27:00PM +0100, Marc Glisse wrote:
> >>Hello,
> >>
> >>I am posting this patch to get some feedback on the approach. The
> >>goal is to replace malloc+free with a stack allocation (a decl
> >>actually) when the size is a small constant.
> >
> >Why constraint yourself to small sizes.
> 
> I am trying to get something to actually work and be accepted in
> gcc. That may mean being conservative.
>
That also may mean that you will cover only cases where it is not needed.

A malloc will have a small per-thread cache for small requests that does
not need any locking. A performance difference will be quite small and 
there may be a define which causes inlining constant size mallocs.

Sizes from 256 bytes are interesting case.

> >Below is a simple implementation which creates a separate stack for
> >that (for simplicity and because it does not need to find bounds on
> >thread stack.)
> 
> One difficulty with using a stack is that lifetimes cannot partially
> overlap, whereas with malloc+free they can.

Which you need to solve anyway if you want to do conversion. You need to
pick a properly overlapping malloc+free area that is best to given
criteria (like that it has maximal expected memory usage.)

> Using the main stack has
> the advantage that I don't have to think of deallocation, it happens
> automatically.

And what is logic of limiting sizes? Note that a leaf function have
higher priority for stack that nonleaf ones as when you do stack
allocation early it may kill lot of leaf allocations because of stack
concerns.

> And using a decl instead of alloca means that even if
> malloc+free was in a loop, not deallocating won't make me grow the
> stack use linearly in the number of iterations of the loop.
> 
Actually this looks like a orthogonal optimalization of memory reuse,
you can transform

x = malloc(42);
free(x);
y = malloc(42);

to

x = malloc(42);
y = x;

It migth be feasible to teach PRE to transform loops with repeated
allocation to a initial allocation and reuse.

> >With that extension it would be possible to mark pointer so its free
> >would be a nop.
> 
> That would help indeed, but it does require a libc that I haven't
> seen yet, and it may cause trouble if people try to LD_PRELOAD a
> different malloc implementation.
> 
This depends if we could get information that malloc did a stack
allocation (or costant size allocation which could be transformed to
different call.)

As a custom free is concerned there are more usage cases. If its worth
complications there is a possibility of prepending LD_PRELOAD with
custom free logic that works regardless of allocator used. 

Reply via email to