On Dec 1, Brent Dax said: >First of all, you sent us this already. :^)
Well, I sent it before I subscribed (and from a different account entirely) and it hadn't showed up in the archives 12 hours after posting, so I figured it was dixed. ># So what's copy-on-write? Basically, it's the use of a ># pointer until that use becomes unsafe. > >Internals hackers: Implementation-wise, I'm thinking we have a wrapper >PMC and vtable, with the original PMC in SELF->data and some metadata in >SELF->cache.struct_val; get operations just call the equivalent op in >the wrapee's vtable, while set operations tell all the dependencies to >copy their data, replace the wrapper with the wrapee, and call through >the new vtable. I have one problem with that scheme, and I'll tell you I optimize it in COW. But first. That's kinda spooky, Brent. First, I understand you (with vague, but appropriate ideas of what PMC and SELF are). Second, the behavior is almost exactly how I had imagined (I found that 'get' and 'len' where especially boring on a dependent variable, while 'set' was the work-horse). Third, you even used the term "dependencies"! >(Japhy: that probably sounds like gibberish to you, but it's really >internals-speak. :^) Perl 6 doesn't have magic. Instead, each PMC >(the data structure behind all variables, scalar or plural) has a table >of operations (called a vtable) attached to it; for example, there's a Ooh, now I know what a PMC is. ;) >bunch for get_integer (like get_integer_same and get_integer_bigint), a >bunch for math operations, etc. Magical stuff is implemented by >swapping out the vtable.) Yeah, I know what a vtable is now -- I didn't when I was at Dan's talk at TPC, but since looking at Perl's implementation of magic, I understand. If someone had just said "dispatch table" to me, I'd've grokked it immediately. > -At least for the substring example, we're talking about mucking with a >STRING's internals to get one to point into another's buffer, which is >generally a Bad Thing. (Especially if there's some sort of start marker >on the string, like the UTF-16 BOM...) Employ another wrapper, or make use of that metadata earlier to set a flag to emulate a wrapper: $x = substr($y, 10, 30); When we get $x's value, not only do we pass 'get' along to its master (or, since $x is $y's dependent, should $y be its benefactor? ;) ), but we store satellite data in $x that we then need to return a substring. It kinda sounds like variables which are operators, if you get my meaning. $x has instructions to do a substring. In COW, I'll be implementing that via another layer of magic: COW_vtbl_substr = { COW_mg_getsubstr, COW_mg_setsubstr, COW_mg_lensubstr }; Which is implemented rather simply (I'm writing this just now, so if it doesn't work, that's why!): int COW_mg_getsubstr (SV *sv, MG *mg) { /* we have a pointer to the STR in SvSTR(sv) already */ /* mg->storage[0] holds offset */ /* mg->storage[1] holds length */ STR sub = SvSTR(sv) + mg->storage[0]; v_strcpy(SvSTR(sv), sub, mg->storage[1]); return 1; } I think that's right. > -GC might be a problem too--will it detect that the memory is still in >use, even if the pointer to it is pointing into the middle of the memory >instead of the start? Maybe STRINGs should have a separate field for >the start of the actual string. If you try to GC the benefactor, you should act as though you're setting it (n'importe quoi) and then the GC can continue. Ok, now for that problem and optimization. $x = "_" x 100_000; $a = $x; $b = $x; $c = $x; $x is now copy-on-write, with three dependents. $x = "foo"; According to you model, we've just copied 300,000 bytes. Personally, I find that unacceptable. Ben Tilly said it best on PerlMonks: if a language is going to use copy-on-write, it should use it extensively. In COW, when a variable has more than one dependent, then when it gets written to, we... 1. copy our contents to our first dependent 2. move the rest of our dependents to that first dep's list of dependents 3. update their master (to the first dependent) I believe that holds water. $x = 'foo'; We copy the 100,000 byte string to $a, $a gets $b and $c as dependents, and $b and $c accept $a as their master. Far less work. You like? -- japhy, perl hacker unordinaire!