On Tue, May 22, 2007 at 08:20:19AM -0500, Patrick R. Michaud wrote: > On Tue, May 22, 2007 at 01:25:33PM +0100, Nicholas Clark wrote: > > > > And how often does the type of a PMC change, such that its internal > > data layout changes? In Perl 5 this morphing happens everywhere, > > but in Parrot?
In fact, this is probably a really good spot for me to review what I've been coming across with PMCs and assignment in general. Returning to the Perl 6 example I gave earlier: my @a = (1, 2, 3); my $b := @a[2]; @a[2] = foo(); The simplified code that I gave for the last assignment operation looked like: ## @a[2] = foo(); $P4 = 'foo'() # $P4 could be any type find_lex $P5, '@a' # look up @a set $P6, $P5[2] # get reference to @a[2] assign $P6, $P4 # $P6 needs to morph # to whatever type $P4 is Now let's remove the simplifications so that we can see what is really having to take place. First, the C<assign> opcode on arbitrary types doesn't do morphing. However, assigning to an .Undef object will cause it to morph to the target type. So, what PAST-pm does in its code generation for assignment is to first morph the target object into an .Undef, and then perform the assignment: ## @a[2] = foo(); $P4 = 'foo'() # $P4 could be any type find_lex $P5, '@a' # look up @a set $P6, $P5[2] # get reference to @a[2] morph $P6, .Undef # morph @a[2] to an Undef assign $P6, $P4 # and assign $P4 to that Undef Thanks to Matt Diephouse for letting me know about this approach. Now then, this assumes that every type knows how to morph itself into an .Undef and that .Undef can handle assignment from any type. For many PMC classes this isn't (or hasn't been) the case; from time to time we stumble across another type that doesn't know how to morph itself to .Undef or for which .Undef cannot handle the assignment. When this happens either Matt or I have gone in and updated the PMC code to allow these conversions -- the prime example is .Sub, but there have been a few others. There's more. The PIR code above for assignment also assumes that @a[2] already exists, which might not be the case. If @a[2] doesn't exist, then $P6 comes back as NULL, and we can't morph a NULL into .Undef. So the code ends up looking like: ## @a[2] = foo(); $P4 = 'foo'() # $P4 could be any type find_lex $P5, '@a' # look up @a set $P6, $P5[2] # get reference to @a[2] unless_null $P6, do_morph # if exists, morph it new $P6, .Undef # create a new object set $P5[2], $P6 # bind @a[2] to the new object goto do_assign # now do the assignment do_morph: morph $P6, .Undef # morph @a[2] to an Undef do_assign: assign $P6, $P4 # and assign $P4 to that Undef And, of course, the code generally ends up looking slightly different if we're talking about assigning to lexical or global variables instead of a keyed object. For example, in the above assignment, if '@a' doesn't already exist we need to vivify it also: ## @a[2] = foo(); $P4 = 'foo'() # $P4 could be any type find_lex $P5, '@a' # look up @a unless_null $P5, assign_1 # does @a exist? $P5 = new .ResizablePMCArray # create an object for @a store_lex '@a', $P5 # store it assign_1: set $P6, $P5[2] # get reference to @a[2] unless_null $P6, do_morph # does @a[2] exist? new $P6, .Undef # no, create a new object set $P5[2], $P6 # bind @a[2] to the new object goto do_assign # now do the assignment do_morph: morph $P6, .Undef # morph @a[2] to an Undef do_assign: assign $P6, $P4 # and assign $P4 to that Undef This is what the code tends to look like for each assignment operation coming out of PAST-pm. Not very pretty. On irc:#parrot I've speculated that it might be really useful to have opcodes or functions that encapsulate the above into something a bit smaller (and hopefully written in C to be faster). I'm thinking that we're missing an "assign_keyed" opcode, although it should probably be called something besides "assign_*" because the vrsion I want would create and morph targets as needed (the current "assign" opcode doesn't do morphing unless the target object explicitly enables it). But, sticking with the slightly-off "assign_keyed" name for now, we'd have ## @a[2] = foo(); $P4 = 'foo'() # $P4 could be any type find_lex $P5, '@a' # look up @a unless_null $P5, assign_1 # does @a exist? $P5 = new .ResizablePMCArray # create an object for @a store_lex '@a', $P5 # store it assign_1: assign_keyed $P5, 2, $P4 # perform assignment Or perhaps that last statement should be written assign $P5[2], $P4 with my earlier caveat that for *this* version of the assign opcode, I want it to create and/morph the target object as needed to match the type of $P4. For package-scoped and global variables we'd just use assign_keyed on the namespace objects instead of having to be concerned with the separate set_global, set_hll_global, or set_root_global opcodes: ## our $y; $y = foo(); $P0 = 'foo'() $P1 = get_namespace assign_keyed $P1['$y'], $P0 Again, this is much nicer than the current: ## our $y; $y = foo(); $P0 = 'foo'() get_global $P1, '$y' unless_null $P1, assign_1 clone $P1, $P0 set_global $P1, '$y' goto done assign_1: morph $P1, .Undef assign $P1, $P0 done: Also, we get a win because we can get the namespace just once at the beginning of any sub that needs it, instead of a separate find_global/test existence/store_global for each assignment operation: ## our $x, $y, $z; $x = foo(); $y = bar(); $z = baz(); $P0 = get_namespace $P1 = 'foo'() assign_keyed $P0['$x'], $P1 $P2 = 'bar'() assign_keyed $P0['$y'], $P2 $P3 = 'baz'() assign_keyed $P0['$z'], $P3 We might also need to have an assign_lex opcode that provides the "create/morph target as needed" semantics that are currently missing. Again, my caveat that perhaps "assign" is the wrong word here, to avoid semantic confusion with the existing "assign" opcode that doesn't provide morphing of the targets [1]. Or perhaps we should just change the existing assign opcode to morph targets by default, in which case we don't need assign_lex. But I don't know the full ramifications of that sort of fundamental design change. Pm [1] Parrot does automatically morph between Integer, Float, and String types when using "assign", but this is part of the semantics of those types and not the assign opcode itself. Other types don't automatically morph via assign, thus we have to go through the .Undef type as described above.