Gordon Henriksen <[EMAIL PROTECTED]> wrote:
... Best example: morph. morph must die.
Morph is necessary. But please note: morph changes the vtable of the PMC to point to the new data types table. It has nothing to do with a typed union.
I overstated when I said that morph must die. morph could live IF:
• the UnionVal struct were rearranged • bounds ere placed upon how far a morph could... well, morph
It doesn't matter if an int field could read half of a double or v.v.; it won't crash the program. Only pointers matter.
To allow PMC classes to guarantee segfault-free operation, morph and cooperating PMC classes must conform to the following rule. Other classes would require locking.
With this vocabulary:
variable: A memory location which is reachable (i.e., not garbage). [*]
pointer: The address of a variable.
pointer variable: A variable which contains a pointer.
access: For a pointer p, any dereference of p—*p, p->field, or p[i]—whether for the purposes of reading or writing to that variable.
And considering:
any specific pointer variable ("ptr"), and
all accesses which parrot might perform[**] on any pointer ever stored in ptr ("A") [***], and
any proposed assignment to ptr
Then:
If any A which once accessed a pointer variable would now access a non-pointer variable,
Then the proposed assignment MUST NOT be performed.
This is a relaxed type-stability definition. (Relaxed: It provides type stability only for pointer variables, not for data variables. It does not discriminate the types of pointers, only that the data structures they directly reference have the same layout of pointers. Also, a loophole allows non-pointer variables to become pointer variables, but not the reverse.)
These rules ensure that dereferencing a pointer will not segfault. They also ensure that it is safe to deference a pointer obtained from a union according to the union's discriminator—regardless of when or in which order or how often parrot read the pointer or the discriminator.[***] I think they're actually the loosest possible set of rules to do this.
[*] Two union members are the same variable.
[**] This is in the variable ptr specifically, not merely in the same field of a similar struct. That is, having an immutable discriminator which selects s.u.v or s.u.i from struct { union { void * v; int i; } u } s is valid. A mutable discriminator is also valid—so long as the interpretation of pointer fields does not change.
[***] But only if the architecture prevents shearing in pointer reads and writes.
From another perspective this is to say:
Every pointer variable must forever remain a pointer.
Union discriminators must not change such that a pointer will no longer be treated as a pointer, or will be treated as a pointer to a structure with a different layout.
The first step in conforming to these rules is guaranteeing that a perlscalar couldn't morph into an intlist or some other complete nonsense. So the default for PMCs should be to prohibit morphing. Also, morphable classes will have a hard time using struct_val without violating the above rules. But for this price, parrot could get lock-free, guaranteed crash-proof readers for common data types. But note that pmc->cache.pmc_val can be used freely! So if exotic scalars wrap their data structures in a PMC *cough*perlobject*cough*managedstruct*ahem*, then those PMCs can be part of a cluster of morphable PMC classes without violating these rules.
Next, the scalar hierarchy (where morphing strikes me as most important) could be adjusted to provide the requisite guarantees, such as: perlstring's vtable methods would never look for its struct parrot_string_t * in the same memory location that a perlnum vtable method might be storing half of a floatval. Right now, that sort of guarantee is not made, and so ALL shared PMCs REALLY DO require locking. That's bad, and it's solvable.
Specifically, UnionVal with its present set of fields, would have to become something more like this:
struct UnionVal { struct parrot_string_t * string_val; DPOINTER* struct_val; PMC* pmc_val; void *b_bufstart; union { INTVAL _int_val; size_t _buflen; FLOATVAL _num_val; } _data_vals; };
If no scalar types use struct_val or pmc_val or b_bufstart, then those fields can go inside the union.
Unconstrained morphing is the only technology that *in all cases* *completely* prevents the crash-proof guarantee for lock-free access to shared PMCs. Without changes to this, we're stuck with implicit PMC locking and what looks like an unusable threading implementation.
This is only the beginning! For example, if parrot can povide type stability, mutable strings can be crash-proofed from multithreaded access. "Wha?!"
1. Add to the buffer structure an immutable capacity field. Or use the memory allocator's length field; same thing.
2. All routines which access the string's memory buffer should copy the string's byte length and buffer pointer to local variables before processing them, so that C ensures the algorithm has a stable copy of the variables. (The actual fields should be declared volatile.)
3. All routines which access the string's memory buffer should now bound their accesses by selecting as the upper bound the lesser of those two variables. Or: Throw an exception if the capacity is less than the byte length.
4. It must be guaranteed that a buffer contraction will change the buffer pointer. (i.e., realloc() musn't just put the top of the buffer back in the free-list.)
This smidgen of extra work will ensure that a string iterator never walks off the end of the buffer, should the pointer to the buffer and the string's length be read from inconsistent states of the same object. Surprising results? Yes. Crashes? No!
On 32-bit platforms with atomic 64-bit reads and writes (or 64-bit platforms with atomic 128-bit reads and writes), the above can actually be avoided if the buffer and byte length are stored adjacently and always written and read as a pair.
But this sort of lock avoidance cleverness can't happen unless parrot can guarantee type stability in at least limited circumstances.
—
Gordon Henriksen [EMAIL PROTECTED]