On Dec-31, Leopold Toetsch wrote:
> Steve Fink wrote:
> 
> >... So I decided to summarize the various approaches in
> >hopes that something will jump out at someone. 
> 
> 
> Great document, thanks for all the work of summarizing this.
> 
> 
> >  (2) point out what's wrong with my "variant 5: generational
> >      stack",
> 
> 
> I think, it moves the problems just around with a lot of overhead. E.g. 
> cloning a PerlArray of 10^6 entries would need 1000 generations (when 
> 1000 PMCs are allocated in one bunch), which would need a ever growing 
> generation stack (using mem too) and leading to a probably non linear 
> slow down in DOD runs.

I don't understand. The outer clone belongs to generation N. Then if
you called clone() individually on each PMC within the array, then by
default you wouldn't create any more generations at all. If you were
worried that you might accumulate too much garbage while doing all
those clones (which is certainly reasonable), then you could put each
sub-clone into its own generation, but the stack would still only get
2 deep (each sub-clone would be surrounded by a generation push/pop).
I don't understand how batching up the PMC allocations changes that
analysis. You could certainly subdivide the clone into as many
generations as you want, but they'd still only ever be 2 deep.

I was thinking that for the most part only complete opcodes would get
their own generation, and only recursive calls to the interpreter to
execute other chunks of PASM would normally push onto the generational
stack. So by default, PerlArray.clone would all fit into one
generation unless the PMCs within the PerlArray implement their clone
vtable entries with PASM code.

> I think, we will need a mixture of:
> 
> >In this case, the solution is simple: resize the array, if necessary,
> >before creating the PMC. 
> 
> 
> i.e. reorder code where possible, and anchor early,
> and ...

I'm starting to think you're right.

> >=head2 Variant 3: clear during DOD
> >
> >The neonate flag is cleared during DOD when an object is encountered
> >during the recursive root set traversal. (Leopold Toetsch's trick of
> >setting the live_FLAG during creation is equivalent to this variation,
> >I think.)
> >
> > + Simple
> > + Fast DOD (DOD already manipulates the flags)
> > - If there are multiple DOD runs before the object is anchored or
> >   dies, it will be prematurely freed
> 
> IMHO it should be possible to guarantee, that there are no consecutive 
> DOD runs for certain operations. The major problem is cloning of 
> aggregates which could take arbitrary resources (though the aggregates 
> normally know how big they are).

I'm not convinced that's ever possible. What if an object overrides
its clone()? Then when an aggregate containing it is cloned, it could
do pretty much anything...

Wait. Dumb question: is clone a shallow or deep copy?

> By rearraning code and disabling DOD during clone (when needed), it 
> should be possible to avoid walking the processor stack/regs alltogether.

Yes, I'm hoping there's some solution where things that have full
knowledge of what might happen can make use of it and gain speed in
the common case, while there's still a safe mechanism for things that
are too difficult or impossible to predict in advance. So we can start
with the easy approach, and then optimize them into the more detailed
approach if needed. Or something. My hands are getting tired from all
the waving around.

Reply via email to