RE: Rules for memory allocation and pointing

Brent Dax Mon, 05 Nov 2001 00:18:14 -0800

Michael L Maraist:
# On Sunday 04 November 2001 02:39 pm, Dan Sugalski wrote:
# > At 08:32 PM 11/4/2001 +0100, Benoit Cerrina wrote:
# > > > There will be a mechanism to register PMCs with the
# interpreter to note
# > > > they're pointed to by something that the interpreter
# can't reach. (For
# > > > example, a structure in your extension code, or via a
# pointer stashed
# > > > in the depths of a buffer object, or referenced by
# another interpreter)
# > > > This "foreign access" registry is considered part of an
# interpreter's
# > > > root set.
# > >
# > >If this is the case, how do you want to move the PMC, I
# thought you wanted
# > > a copying collector?
# >
# > While the PMC structures themselves don't move (no real
# need--there of
# > fixed size so you can't fragment your allocation pool,
# though it makes
# > generational collection easier to some extent) the data
# pointed to by the
# > PMC can. That's the bit that moves.
# >
#
# Ok, so far, here's what I see:
#
# There are two main memory segments.  A traditional alloc/free
# region.  Within
# this area are arena's (for fixed sized memory objects).  This
# region must be
# efficient within MT, and hopefully not too wasteful or
# fragmenting.  This
# region is mostly for core operation and the non arena
# allocations are to be
# minimized; memory leaks are critical as usual. The
# utilization patterns
# should be analyzable, well known and compensated for (with respect to
# fragmentation, thread-contention, etc).  My vmem-derivative
# might still be
# valuable here, but I suppose we need to let the core's
# needs/characteristics
# flesh out further.
#
# Then there's a GC region which is primarly the allocation
# space of the
# interpreted app, which obviously can't be trusted with
# respect to memory
# leaks or usage (fragmentation) patterns.  PMC's and strings
# are handles into
# this GC-region, though the handles are to be stored in
# core-memory (above).


My understanding is that we will pretty much only allocate PMCs out of
the arena and any buffers are allocated out of the GC region.  (I could
be wrong, of course...)

# Question: Are there other types of references?  I can't think of any.
#
# The GC algorithm (being a separate thread or called
# incrementally when memory
# is low (but not yet starved)), needs to quickly access this
# GC heap's values,
# which I believe is Dans for requiring a maximum of two levels of
# indirection.  I suppose it just runs down the known PMC lists
# including the
# PMC and string register sets, the stacks and the stashes for each
# interpreter.  You do, however refer to a "foreign address"
# region, and my
# first impression is that it's the wrong way of going about it.
#
# First of all, how are arrays of arrays of arrays handled?
#
# // create a buffer object of PMCs
# PMC_t p1 = new Array(50)
#
# for i in 0 .. 49:
#   p1->data[ i ] = new Array(50)
#
# In order to comply with the max-depth, the array creation
# will have to
# register each sub-array-entry in the foreign access region,
# or am I missing
# something?

I think the comment about 'two levels deep' just meant that he's not
going to look at arrays of arrays of arrays of arrays of buffers.  I
think PMCs do recursive things.  Once again, however, I could be wrong.

# First of all, this will mean that the foreign access
# data-structure will grow
# VERY large when PMC arrays/ hashes are prevalant.  What's worse, this
# data-structure is stored within the core, which means that there is
# additional burden on the core memory fragmentation / contention.

No...foreign access (if I undestand correctly, once again) is just a way
of saying 'hey, I'm pointing at this', a bit like incrementing a
refcount.

# Additionally, what happens when an array is shared by two
# threads (and thus
# two interpreters).  Who's foreign access region is it stored
# in?  My guess is
# that to avoid premature freeing, BOTH.  So now a work-q used
# by a 30-thread
# producer/consumer app is allocating and freeing LOTS of
# core-memory on each
# enqueue / dispatch..  Again, with the details this fuzzy, I'm
# probably going
# off half-cocked; but I did qualify that this was my initial
# impression.

I think there's going to be a virtual shared interpreter that all shared
objects belong to.

# My suggestion is to not use a foreign references section; or
# if we do, not
# utilize it for deep data-structure nesting.  And instead
# incorporate a doubly
# linked list w/in PMCs and strings...  Thus wheneever you
# allocate a PMC or
# string, you attach it to the chain of allocated handles.
# Whenever the PMC is
# free'd, you detach it.  The GC then has the laughably simple task of
# navigating this linked list, which spans all threads.  This
# can encorporate
# mark-and-sweep or copying or what-ever.  By adding 8 or 16
# bytes to the size
# of a PMC / string, you avoid many memory related problems.
# Not to mention
# the fact that we are free of the concern of depth away from the
# interpreter-root.

Still, that's overhead we would like to avoid.  Remember, Parrot has to
work on a Palm--that means a laughably small stack (for some reason 96
KB pops into my head) and somewhere between 2-16 MB of memory shared as
RAM *and* permanent storage.  We need all the space we can get in that
kind of environment.

# Beyond this, I think I see some problems with not having PMCs
# relocatable.
# While compacting the object-stores that are readily resized
# can be very
# valuable, the only type of memory leak this avoids is
# fragmentation-related.
# The PMCs themselves still need to be tested against memory
# leaks.  Now I'm
# still in favor of some form of reference counting; I think
# that in the most
# common case, only one data-structure will reference a PMC and
# thus when it
# goes away, it should immediately cause the deallocation of
# the associated
# object-space (sacraficing a pitance of run-time CPU so that
# the GC and free
# memory are relaxed).  But I hear that we're not relying on an
# integer for
# reference counting (as with perl5), and instead are mostly
# dependant on the
# GC.   Well, if we use a copying GC, but never move the PMC,
# then how are we
# freeing these PMCs?  It would be possible to additionally utilize
# mark-and-sweep, but then we'd have three separate algorithms:
#  Simplified
# reference counting deallocation (when a multi-referenced flag
# is not set),
# object store compaction (to minimize the variable-sized heap
# stored in the
# copying GC's heap), and mark-and-sweep when we're really
# memory starved and
# we don't trust that objects have been freed.
$
# If at least the last two parts of the algorithm are used,
# then the GC would
# periodically navigate the list.  On the first pass, it would
# find find stale
# handles (those freed by simplified reference counting), and
# detach them from
# the list.  Note If a non GC thread detached the PMC from the
# chain, it would
# derail the GC.  Global locking would be required which is a
# bad thing (tm).
# The first pass also would reset the mark-and-sweep flags.
#
# The second pass performs the marking.  I believe this one
# requires navigation
# of all the interpreter data-structures ( stacks, register
# sets, etc ).  In
# fact, it seems to me that since within the allocation-chain,
# only references
# need be utilized in the mark-stage.. Perhaps a second linked
# list could be
# provided for such references.. Perhaps an additional
# 8/16Bytes within each
# PMC (or within the reference-specific datastructure).  This
# would cut-down on
# the mark-stage's overhead.

Once again, we need to work on ridiculously low-memory architectures.
This is not Perl 5--we *can't* just solve every problem by throwing more
memory at it.

# Lastly, the PMC-allocation-chain is navigated.  Those PMCs/
# Strings not
# "marked" are freed.  Those that are are "copied" into the
# mirror heap so as
# to consolidate free memory.
#
# Even with this, I see some issues:
#
# Firstly, allocating a new PMC requires special care so as not
# to derail the a
# GC navigating the PMC-allocation-list in another thread
# (since we're avoiding
# the use of lock contention)

Why not just have a chain for each thread?

# Secondly, PMCs are allocated by attaching themselves to the
# head of the
# allocation-chain.  This could require a global lock on this
# one tiny part.
# While relatiely fast, it might be better for each interpreter
# to maintain a
# separate chain (and thus head-pointer).  This requires that
# the GC navigate
# each chain separately which adds nominal overhead.  Note that
# sharing of PMCs
# between interpreters doesn't affect this, since only the
# PMC-creater and the
# GC needs to worry about the allocation-chains.
#
# Thirdly, since partial ref-counting doesn't free the PMC handle, it's
# possible that algorithms that allocate and free millions of
# PMCs will still
# grow the OS heap enormously.  I really want to avoid this
# phenomena, but it
# would involve locking the allocation-chain.  Perhaps, taking
# a queue from
# vmem, regional locks can be performed.  The GC and the
# worker-thread compete
# for these tiny regions which have nominal lock-times;
# spin-locks should
# suffice. Namely, split the allocation chain into "magazines"
# of m items, then
# produce a linked list of these magazines.. Locks occur on the
# magazines.
# This adds little or no overhead to the worker thread, but
# does add some
# overhead to the GC (performing lots of [un]locking).
#
# Fourthly, to allow new PMC allocations to occur while the GC
# navigates the
# list, it will have to temporarily detach all the lists from their
# head-pointers (unless regional locking is used).  When the GC
# is done, it can
# reattach the compacted lists to their respective heads.  If
# regional locking
# isn't used, a simple lock around the head pointer can be used
# very quickly.
#
# Fithly, it would really be nice to separate portions of the
# GC-heap between
# threads.  One thing I learned from the vmem-archetecture is
# the danger of
# cache-sharing.  If you carve out an 8 byte region from the
# heap for one
# thread, then the next 8 bytes for another thread (both
# running on separate
# CPUs), then they're going to share a cache-line, and thus cripple
# performance.  I've witnessed this even on my dual proc
# celeron.  While it's
# acceptible let "some" objects be shared, a raw stack-carver
# would garuntee
# the demise of performance of multi-CPU architectures in an MT
# environment.
# One way to avoid this is to use separate heaps for each
# thread.. This all but
# eliminates locking contention, while solving the
# cache-sharing problem.  One
# obvious down-side is that you require more initial memory.
# By producing
# linked-lists of heap-chunks (of size 128K) instead of having
# monolithicly
# growing heaps, the compacting can occur in stages, and thus
# dramatically
# minimize the unused portions of memory. (By never have more
# than a single
# 128K heap-chunk that's empty).  Though I'm sure these details
# are addressed
# by existing state-of-the art GCs.
#
# Lastly, with the increase of complexity, it would be
# interesting to weigh the
# pros and cons of a purely reference-counted PMC/string.  The non-PMC
# object-data would take advantage of the periodic copy-GC, but the
# object-handle management would be completely maintained by
# the "set_p_xxx",
# and stack op-codes.  Each entry in a register counts as a
# reference.  While
# slightly more run-time maintainance, we avoid periodic O(2n)
# linked-list
# lookups (for millions of PMCs).  Since not all PMCs or
# strings even refer to
# things in the GC-heap, this is an effective compromise.  Just
# as a point of
# comparison.  True we still have the potential for leaks, but that is
# avoidable even in perl5.6.
#
# In Summary, the lingering issues are:
# a) PMC memory management is complex and non-ideal whether PMCs are
# relocatable or not. And whether they're GC'd or ref-counted (meaning
# someone's going to spend CPU time somewhere) .  GC provides
# less worrying
# points for the core interpreter, but necessitates more asynchronous
# work-loads.
#
# b) "Foreign access" management seems problematic.

--Brent Dax
[EMAIL PROTECTED]
Configure pumpking for Perl 6

When I take action, I’m not going to fire a $2 million missile at a $10
empty tent and hit a camel in the butt.
    --Dubya

RE: Rules for memory allocation and pointing

Reply via email to