On Dec-31, Leopold Toetsch wrote: > Steve Fink wrote: > > >... So I decided to summarize the various approaches in > >hopes that something will jump out at someone. > > > Great document, thanks for all the work of summarizing this. > > > > (2) point out what's wrong with my "variant 5: generational > > stack", > > > I think, it moves the problems just around with a lot of overhead. E.g. > cloning a PerlArray of 10^6 entries would need 1000 generations (when > 1000 PMCs are allocated in one bunch), which would need a ever growing > generation stack (using mem too) and leading to a probably non linear > slow down in DOD runs.
I don't understand. The outer clone belongs to generation N. Then if you called clone() individually on each PMC within the array, then by default you wouldn't create any more generations at all. If you were worried that you might accumulate too much garbage while doing all those clones (which is certainly reasonable), then you could put each sub-clone into its own generation, but the stack would still only get 2 deep (each sub-clone would be surrounded by a generation push/pop). I don't understand how batching up the PMC allocations changes that analysis. You could certainly subdivide the clone into as many generations as you want, but they'd still only ever be 2 deep. I was thinking that for the most part only complete opcodes would get their own generation, and only recursive calls to the interpreter to execute other chunks of PASM would normally push onto the generational stack. So by default, PerlArray.clone would all fit into one generation unless the PMCs within the PerlArray implement their clone vtable entries with PASM code. > I think, we will need a mixture of: > > >In this case, the solution is simple: resize the array, if necessary, > >before creating the PMC. > > > i.e. reorder code where possible, and anchor early, > and ... I'm starting to think you're right. > >=head2 Variant 3: clear during DOD > > > >The neonate flag is cleared during DOD when an object is encountered > >during the recursive root set traversal. (Leopold Toetsch's trick of > >setting the live_FLAG during creation is equivalent to this variation, > >I think.) > > > > + Simple > > + Fast DOD (DOD already manipulates the flags) > > - If there are multiple DOD runs before the object is anchored or > > dies, it will be prematurely freed > > IMHO it should be possible to guarantee, that there are no consecutive > DOD runs for certain operations. The major problem is cloning of > aggregates which could take arbitrary resources (though the aggregates > normally know how big they are). I'm not convinced that's ever possible. What if an object overrides its clone()? Then when an aggregate containing it is cloned, it could do pretty much anything... Wait. Dumb question: is clone a shallow or deep copy? > By rearraning code and disabling DOD during clone (when needed), it > should be possible to avoid walking the processor stack/regs alltogether. Yes, I'm hoping there's some solution where things that have full knowledge of what might happen can make use of it and gain speed in the common case, while there's still a safe mechanism for things that are too difficult or impossible to predict in advance. So we can start with the easy approach, and then optimize them into the more detailed approach if needed. Or something. My hands are getting tired from all the waving around.