On Oct 21, 2003, at 5:53 AM, Dan Sugalski wrote:

Note that I do *not* want to have multiple object traversal systems in
parrot! We have one for DOD, and proposals have ranged upwards from there.
No. That is *not* happening--the chance for error is significant, the
side-effects of the error annoying and tough to track down for complex
cases (akin to the trouble with tracking down GC issues), and just not
necessary. (Perhaps desirable for speed/space reasons, but desirable
isn't necessary)

DOD's mark() routine has different requirements then a general traverse() for freeze(), chill(), clone(), and destruction ordering. Using just mark() will have these side effects that you want to avoid.

The only thing that mark does that the general traversal doesn't, in the
abstract, is flip the object's live flag. Everything else is an
optimization of code which we can, if we need, discard.

I don't believe that is quite true. There are a couple of important differences between traversal-for-GC and traversal-for-serialization, which will be a challenge to reconcile in the one-true-traversal:


1) Serialization traversals need to "take note" of logical int and float slots (e.g., as used in perlint.pmc and perlnum.pmc) so that they can be serialized, but for GC you only need to worry about GC-able objects. It's difficult to come up with a reasonable callback which can take either int, float, or PObj arguments.

2) It's reasonable for an object to have a pointer to some sort of cache object, which is not logically part of the object, and shouldn't be serialized along with it. This needs to be traversed for GC purposes, but needs to not be traversed for serialization. (Situations such as this--physical but not logical membership--are the origin of the "mutable" keyword in C++.)

3) Traversal for GC needs to do loop detection, but can just stop going down a particular branch of the object graph once it encounters an object it's seen before. Serialization traversals would need to have a way, upon encountering an object seen before, to include in the serialization stream an indication that the current object has already been serialized, and enough information to enable deserialization code to go find it and recreate the loop. The only options I see here are either for serialization to involve the allocation of unbounded additional memory, or to expand the PObj structure to include a slot for a UUID which can be used as a back-reference in a stream, or to have serialization break loops (so that deserialized structures never have loops).

I'm not 100% convinced that a single approach can't handle both applications, but it's looking as though their requirements are different enough that it may not work well.

Two other questions/concerns/comments/issues:

1) I assume that ultimately a user-space iterator would end up calling the traversal code, right? If so, you can't reasonably mandate that only one traversal be in progress at one time. That would be the canonical way to compare two ordered collections--get an iterator for each, and compare element-by-element.

2) I don't see it as a huge problem that serialization code could end up creating additional objects if called from a destroy() method. (Though yes, it would be a problem for GC infrastructure code to.) I say that for two reasons: (a) destroy() methods can really do anything they want, and if that task involves allocating additional memory, that just makes it a risk to perform that task in a destroy() method--it may fail due to out-of-memory conditions. I think that Java design experts tend to argue against doing things like serialization in finalization methods. It sounds elegant, but it's problematic. One reason for this is that you tend to want to serialize structures as a whole, not piece-by-piece as they are garbage-collected. The second reason it is not always a problem in practice is that (b) a DOD run may be triggered by an out-of-headers conditions, but that doesn't mean that an additional chunk of memory for headers can't be allocated. If it can't be, then this is no more problematic that it would be in other user code--think of the case where I have some big tree of objects I want to make some sort of copy of, with the intention of then letting go of the original when I'm done. I'll be freeing up headers at the end of that process, but if I run out of memory part-way-through, then I'm just stuck.

3) I assume that not every object is assumed to be serializable? For instance, an object representing a filehandle can't really be serialized in a useful way. So I'm not sure of what sort of "fidelity" is required of a generic serialization method--that is, how similar a deserialized structure is guaranteed to be to the original.


JEff




Reply via email to