Mitchell N Charity wrote:

   Summary: its slower :-(

:(

Yep


Calculating the flags position in the pool in pobject_lives() and free_unused_pobjects() takes more time then the smaller cache foot_print does gain. Two reasons: positions have to be calced twice and cache is more stressed with other things, IMHO.

Hmm... the first reason, a second bit of pointer arithmetic, seems
surprising, cycles being sooo much cheaper than cache misses.

Here are the relevant bits:

# define pool(o) ((struct Small_Object_Arena *) (PTR2UINTVAL(o) & POOL_MASK))
# define DOD_FLAGS(o) \
((POOL_FLAG_TYPE *)pool(o)->flags) \
[((char*)(o) - (char*)(pool(o)->start_objects)) / pool(o)->object_size]

(object_size is copied from pool, not currently there)

This is a general version that plugs in as a replacement for PObj_get_FLAGS(o), but it was called only once per function. I think the real problems are here not the cycles of pointer arithmethic, there are different problems:
- we can't use explicit pool pointers, handling flags directly is faster
(getting the pool pointer has the same cache impact)
- when there are no explicit pool pointers, something like above has to calulcate the pool position, which needs a fixed sized POOL_MASK i.e the pool size.
- with fixed sized pools (buffer & PMCs) all alike, a List, List_chunk, Hash, String and so on, get all the same pool size, though they may be used just once, leading to huge buffer and bufferlike pools too.
- e.g. stress.pasm needs 500K PMCs, fastest is to grow pools huge to some Megs of mem or finally ~200.000 PMCs per pool->arena.
- e.g. life.pasm needs per cycle only ~ 50 strings, but needs really fast recycling of these, so the pool size should be not really bigger then the demand (which holds for all programs).
- with fixed sized pools, I see no possibilty, to deal with these to extreme demands.
- I did also try to not add_free all objects immediatly and reduce arena->used, so that the free_unused_pobjects is faster, but this needs a DOD run before. We don't know, in which header_pool is the shortage.
And, when one pool holds ~10^6 objects and other pools ~nothing, a DOD run for allocating more for the rarely used pool is too expensive.

stress.pasm with fixed sized pools spends the time in free_unused_pobjects() because there are too many (dead - or better never alive) objects around.

... So I
modified the tpmc test with a second calc.

The test is for one fixed sized pool with one header kind. We have pools for objects of sizeof Buffer, List_chunk, List, hash, PMC and probably more which may have very different header counts from 0 to 1e6 or more. All have to be somehow equally fast. We can trade a little bit to favor one kind of headers, but not all.
We can't allocate a fixed size huge pool arena for the worst case, all others and memory consumption suffer.


I don't suppose it is still touching the PMC bodies for any reason?

No. But wading through the root set, zig header pools, marking stacks and so on, needs cache space.


Puzzled,

So was I. Tests looked really fine.

BTW If you (or anyone) wants a patch just mail me


Mitchell
leo

Reply via email to