On Tue, 12 Nov 2013, Ondřej Bílka wrote:

On Tue, Nov 12, 2013 at 01:41:24PM +0100, Marc Glisse wrote:
On Tue, 12 Nov 2013, Ondřej Bílka wrote:

I am trying to get something to actually work and be accepted in
gcc. That may mean being conservative.

That also may mean that you will cover only cases where it is not needed.

A malloc will have a small per-thread cache for small requests that does
not need any locking. A performance difference will be quite small and
there may be a define which causes inlining constant size mallocs.

Sizes from 256 bytes are interesting case.

I have to disagree here. When the allocated size is large enough,
the cost of malloc+free often becomes small compared to whatever
work you are doing in that array. It is when the size is very small
that speeding up malloc+free is essential. And you are
underestimating the cost of those small allocations.

No, just aware that these are important and there will be optimizations
that convert these. For example:

#define malloc (s) ({ \
 static pool p; \
 if (__builtin_constant_p (s) { \
   alloc_from_pool(&p); \
 else \
   malloc (s); \
})

Seems to be missing some bits.

How will you find small constant allocations with this in place?

I won't. If your code is already optimized, the compiler has nothing left to do, that's fine. (not that I am convinced your optimization works that well)

I started on this because of an application that spends more than
half of its time in malloc+free and where (almost) no allocation is
larger than 100 bytes. Changing the code to not use malloc/free but
other allocation strategies is very complicated because it would
break abstraction layers. I used various workarounds that proved
rather effective, but I would have loved for that to be unnecessary.

See my memory pool that uses custom free functionality where you need
only change malloc, free is handled automaticaly.

Do you mean the incomplete macro above, or your STACK_ALLOC macro from the other post? (don't know how that one works either, "size" appears out of nowhere in STACK_FREE)

As I already said, I know how to write efficient code, but that's hard on the abstraction layers (before inlining, you have to go at least 20 layers up in the CFG to find a common ancestor for malloc and free), and I'd be happy if the compiler could help a bit in easy cases.

And what is logic of limiting sizes?

Main stack is rather small by default. And it is nice if other
variables on the stack are not spread over too many memory pages.

No, question was what is logic that you will use to decide if allocation
is stack or heap?

Also if we take not spreading stack argument to logical conclusion a
best course of action in adding a separate stack and placing arrays/alloca
over 1024 bytes there.

Then you also need to handle deallocations in there instead of relying on the automatic ones in the main stack. But yes, that's possible.

As this is hard in gcc then doing these without gcc should be easier.

Yeah, we shouldn't bother writing an optimizing compiler, what were we thinking ;-)

In several cases (constant size) we do not need gcc involvement as we
colud get information ourself.

In several cases, we don't need to bother manually optimizing our code, gcc could notice that things are simple and optimize them itself.

Then there are parts where coordination is necessary, one is determining
if stack allocation is possible. A posible way would be first turn a
eligible malloc calls to

malloc_stack(size, color)

as hint to allocator. I added a color parameter to handle partial
overlap, if you do a coloring with edge when allocations partialy
overlap then you can assign to each color class a stack and proceed as
normal.

That would be great, yes. I'll be looking forward to your patches.

(note that the limits of alias analysis mean that gcc often has no idea which free goes with which malloc).

--
Marc Glisse

Reply via email to