On Tue, Nov 12, 2013 at 01:16:14AM +0100, Marc Glisse wrote: > On Mon, 11 Nov 2013, Ondřej Bílka wrote: > > >On Sun, Nov 10, 2013 at 04:27:00PM +0100, Marc Glisse wrote: > >>Hello, > >> > >>I am posting this patch to get some feedback on the approach. The > >>goal is to replace malloc+free with a stack allocation (a decl > >>actually) when the size is a small constant. > > > >Why constraint yourself to small sizes. > > I am trying to get something to actually work and be accepted in > gcc. That may mean being conservative. > That also may mean that you will cover only cases where it is not needed.
A malloc will have a small per-thread cache for small requests that does not need any locking. A performance difference will be quite small and there may be a define which causes inlining constant size mallocs. Sizes from 256 bytes are interesting case. > >Below is a simple implementation which creates a separate stack for > >that (for simplicity and because it does not need to find bounds on > >thread stack.) > > One difficulty with using a stack is that lifetimes cannot partially > overlap, whereas with malloc+free they can. Which you need to solve anyway if you want to do conversion. You need to pick a properly overlapping malloc+free area that is best to given criteria (like that it has maximal expected memory usage.) > Using the main stack has > the advantage that I don't have to think of deallocation, it happens > automatically. And what is logic of limiting sizes? Note that a leaf function have higher priority for stack that nonleaf ones as when you do stack allocation early it may kill lot of leaf allocations because of stack concerns. > And using a decl instead of alloca means that even if > malloc+free was in a loop, not deallocating won't make me grow the > stack use linearly in the number of iterations of the loop. > Actually this looks like a orthogonal optimalization of memory reuse, you can transform x = malloc(42); free(x); y = malloc(42); to x = malloc(42); y = x; It migth be feasible to teach PRE to transform loops with repeated allocation to a initial allocation and reuse. > >With that extension it would be possible to mark pointer so its free > >would be a nop. > > That would help indeed, but it does require a libc that I haven't > seen yet, and it may cause trouble if people try to LD_PRELOAD a > different malloc implementation. > This depends if we could get information that malloc did a stack allocation (or costant size allocation which could be transformed to different call.) As a custom free is concerned there are more usage cases. If its worth complications there is a possibility of prepending LD_PRELOAD with custom free logic that works regardless of allocator used.