On Tue, Nov 12, 2013 at 05:01:31PM +0100, Marc Glisse wrote: > On Tue, 12 Nov 2013, Ondřej Bílka wrote: > > >On Tue, Nov 12, 2013 at 01:41:24PM +0100, Marc Glisse wrote: > >>On Tue, 12 Nov 2013, Ondřej Bílka wrote: > >> > >>>>I am trying to get something to actually work and be accepted in > >>>>gcc. That may mean being conservative. > >>> > >>>That also may mean that you will cover only cases where it is not needed. > >>> > >>>A malloc will have a small per-thread cache for small requests that does > >>>not need any locking. A performance difference will be quite small and > >>>there may be a define which causes inlining constant size mallocs. > >>> > >>>Sizes from 256 bytes are interesting case. > >> > >>I have to disagree here. When the allocated size is large enough, > >>the cost of malloc+free often becomes small compared to whatever > >>work you are doing in that array. It is when the size is very small > >>that speeding up malloc+free is essential. And you are > >>underestimating the cost of those small allocations. > >> > >No, just aware that these are important and there will be optimizations > >that convert these. For example: > > > >#define malloc (s) ({ \ > > static pool p; \ > > if (__builtin_constant_p (s) { \ > > alloc_from_pool(&p); \ > > else \ > > malloc (s); \ > >}) > > Seems to be missing some bits. > A example, its purpose is to show a idea not to be complete.
> >How will you find small constant allocations with this in place? > > I won't. If your code is already optimized, the compiler has nothing > left to do, that's fine. (not that I am convinced your optimization > works that well) > What if it decreases running time of all constant allocations by 6%. Converting to stack allocation would eliminate overhead but eliminated sites contributed to 5% of runtime. > >>I started on this because of an application that spends more than > >>half of its time in malloc+free and where (almost) no allocation is > >>larger than 100 bytes. Changing the code to not use malloc/free but > >>other allocation strategies is very complicated because it would > >>break abstraction layers. I used various workarounds that proved > >>rather effective, but I would have loved for that to be unnecessary. > > > >See my memory pool that uses custom free functionality where you need > >only change malloc, free is handled automaticaly. > > Do you mean the incomplete macro above, or your STACK_ALLOC macro > from the other post? (don't know how that one works either, "size" > appears out of nowhere in STACK_FREE) > Also a example where actual logic could be supplied later, should be __stack_new instead size. I am not talking about stack conversion but about memory pool, a proof-of-concept is here. https://www.sourceware.org/ml/libc-alpha/2013-11/msg00258.html > As I already said, I know how to write efficient code, but that's > hard on the abstraction layers (before inlining, you have to go at > least 20 layers up in the CFG to find a common ancestor for malloc > and free), and I'd be happy if the compiler could help a bit in easy > cases. > This is more about using for allocation libraries that are flexible enough. > >Then there are parts where coordination is necessary, one is determining > >if stack allocation is possible. A posible way would be first turn a > >eligible malloc calls to > > > >malloc_stack(size, color) > > > >as hint to allocator. I added a color parameter to handle partial > >overlap, if you do a coloring with edge when allocations partialy > >overlap then you can assign to each color class a stack and proceed as > >normal. > > That would be great, yes. I'll be looking forward to your patches. > > (note that the limits of alias analysis mean that gcc often has no > idea which free goes with which malloc). > Wait, you have a free with same SSA_NAME as malloc and alias analysis cannot tell which malloc corespond to that free? > -- > Marc Glisse -- We've picked COBOL as the language of choice.