mixed numeric and string SVs.
Has anyone given thought to how an SV can contain both a numeric value and string value in Perl6? Given the arbitrary number of numeric and string types that the vatble scheme of Perl6 support it will be unviable to to have special types for all permuations (eg, utf8_nv, unicode32_iv, ascii_bitint, ad nauseum). It seems to me the following options are poossible: 1. We no longer save conversions, so $i="3"; $j+=$i for (...); does an aton() or similar each time round the loop 2. Each SV has 2 vtable pointers - one for it's numeric representation (if any), and one for its string represenation (if any). Flexible, but may require an extra 4/8 bytes per SV. 3. We decree that all string to numeric conversions should return a particular numeric type (eg NV), and that all numeric to string conversions should similary convert to a fixed string type (eg utf8). (Although I'm not sure that really helps.) 4. Err, that's it. Any opinions?
Re: mixed numeric and string SVs.
On Wed, Dec 20, 2000 at 02:26:12PM +, David Mitchell wrote: > Has anyone given thought to how an SV can contain both a numeric value > and string value in Perl6? > Given the arbitrary number of numeric and string types that the vatble > scheme of Perl6 support it will be unviable to to have special types > for all permuations (eg, utf8_nv, unicode32_iv, ascii_bitint, ad nauseum). > > It seems to me the following options are poossible: > > 1. We no longer save conversions, so > $i="3"; $j+=$i for (...); > does an aton() or similar each time round the loop I fear this would be a performance hit. I'm told TCL pre version 8 was like this - everything's a string and converted to a number each and every time the number is needed. > 2. Each SV has 2 vtable pointers - one for it's numeric representation > (if any), and one for its string represenation (if any). Flexible, but > may require an extra 4/8 bytes per SV. It may not be terrible. How big is the average SV already anyway? > 3. We decree that all string to numeric conversions should return > a particular numeric type (eg NV), and that all numeric to string > conversions should similary convert to a fixed string type (eg utf8). > (Although I'm not sure that really helps.) Feels like a bad plan, as it can be that no single intrinsic type (ie one native to the compiler of the implementation language) is a superset of all the others (eg a platform with 64 bit integers, and the longest floating point type being 64 bit) > 4. Err, that's it. If vtables are held in a common pool (garbage collected?) with the flexibility to allow every scalar to potentially have its own, then I think there's possibility 4. vtables have subsections, so all the numeric operations are in a subsection, all the string operations in another. At most you have (number of numeric types) * (number of string types) vtables, but actually you just create a new table with the appropriate numeric & string subsections every time you cause a numeric or string conversion to a pair this-sort-of-string,that-sort-of-number that you've not seen before. So there's still only 1 vtable pointer per scalar. This will slow things down if you attempt to add double-ascii to double-UTF8, as the if (vtable_a == vtable_b) won't be true, but it could be possible to store a token in each subsection to say what it was, so then you get if(vtable_a == vtable_b || vtable_a->num_type == vtable_b->numtype) { /* It's the same sort of number in each */ } else { /* bah. generic logic needed */ } numtype is NULL (or "string") or something token but different from all real numbers if the scalar isn't a number, which will prompt numeric conversion to the best sort of number as need-be. (so "3.1 + 5i" would do the right thing. presumably complex floating point) And (like perl5) if you alter a numeric scalar as a string, it becomes just a string [so {(3.1 + 5i) . ''} is a string] Nicholas Clark
Re: mixed numeric and string SVs.
> > 3. We decree that all string to numeric conversions should return > > a particular numeric type (eg NV), and that all numeric to string > > conversions should similary convert to a fixed string type (eg utf8). > > (Although I'm not sure that really helps.) > > Feels like a bad plan, as it can be that no single intrinsic type > (ie one native to the compiler of the implementation language) is struct { IV whatitis; union { IV iv; UV uv; NV nv; }; } :-) Yeah, doesn't help for complex numbers, quaternions, octonions, or bignums... > a superset of all the others > (eg a platform with 64 bit integers, and the longest floating point type > being 64 bit) -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen
Re: mixed numeric and string SVs.
On Wed, Dec 20, 2000 at 09:00:47AM -0600, Jarkko Hietaniemi wrote: > struct { > IV whatitis; more a perl5 question - why IV not int? int might be smaller and "more natural" (your words) eg why does looks_like_number return IV not int? and various other bits of the perl API use IV? Nicholas Clark
Re: mixed numeric and string SVs.
On Wed, Dec 20, 2000 at 03:06:06PM +, Nicholas Clark wrote: > On Wed, Dec 20, 2000 at 09:00:47AM -0600, Jarkko Hietaniemi wrote: > > > struct { > > IV whatitis; > > more a perl5 question - why IV not int? > int might be smaller and "more natural" (your words) That's K&R's words, not mine... and that's only an ideal, not always the real truth. E.g. in Digital UNIX a long of 64 bits is very natural, an int (32 bits) is a nice backward compatibility concession. > eg why does looks_like_number return IV not int? and various other bits > of the perl API use IV? > > Nicholas Clark -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen
Re: mixed numeric and string SVs.
On Wed, Dec 20, 2000 at 04:03:39PM +, David Mitchell wrote: > > >1. We no longer save conversions, so > > > $i="3"; $j+=$i for (...); > > >does an aton() or similar each time round the loop > > > > Well just the 1st time - then it is a number... > > Err, option (1) was explicity suggesting we *dont* save the result > of the conversion, so aton() *would* have to be called each time. > (I didnt think this was sensible, I was just suggesting it > for completeness...) I think Nick is suggesting that we convert it and lose the string as a side effect. I may be wrong This would give you $a="cheese"; printf "%d\n", $a; print "$a\n"; 0 0 because the %d would trigger a conversion to integer which then replaces the string. Not what is expected. The only benefit this would bring is that both TomC and Ilya would agree on something - that this is not desirable behaviour (TomC because it's not backwards compatible, Ilya because you can alter a scalar's value as a side effect of accessing it, so what a scalar appears to contain becomes a function of its access history, not simply and solely what you assigned to it) Nicholas Clark
Re: mixed numeric and string SVs.
> > It seems to me the following options are poossible: > > > > 1. We no longer save conversions, so > > $i="3"; $j+=$i for (...); > > does an aton() or similar each time round the loop > > I fear this would be a performance hit. I'm told TCL pre version 8 was > like this - everything's a string and converted to a number each and every > time the number is needed. Yes, I fear the same! > > > 2. Each SV has 2 vtable pointers - one for it's numeric representation > > (if any), and one for its string represenation (if any). Flexible, but > > may require an extra 4/8 bytes per SV. > > It may not be terrible. How big is the average SV already anyway? True, but I've just realised a complication with my suggestion. If there are a multiple vtable ptrs per SV, which type 'owns' the SV carcass, and is responsible for destruction, and has permission to put its own stuff in the payload area etc? I think madness might this way lie. So here's a modified suggestion. Rather than having 2 vtable ptrs per scalar, we allow a string type to contain an optional pointer to another subsidiary SV containing its numeric value. (And vice versa). Then for example the getint() method for a utf8 string type might look like: utf8_getint(SV *sv) { if (sv->subsidiary_numeric_sv == NULL) { sv->subsidiary_numeric_sv = Numeric->new(aton(sv->value)); } return sv->subsidiary_numeric_sv->getint(); } (uft8 stringgy methods that alter the string value of the SV are then responsible for either destroying the subsidiary numeric SV, or for making sure it's value gets updated, or for setting a flag warning that it's value needs recalculating.) Similarly, the stringy methods for numeric types are wrappers that optionally create a subsidiary string SV, then pass the call onto that object. Or to avoid the conditional each time, there could be 2 vtables for each type, containing 'with subsidiary' and 'without subsidiary' methods; the role of the latter being to create the subsidiary SV and update the type of the main SV to the 'with subsidiary' type.
Re: Garbage collector slowness
Mark-Jason Dominus <[EMAIL PROTECTED]> writes: >> "The new version must be better because our gazillion dollar marketing >> campaign said so. (We didn't really *fix* anything.) > >The part I found interesting was the part about elimination of the message. printing messages can be surprisingly slow - if they go to unbuffered stderr which is an X window of some kind they can end up waiting for an ACK from the X server, which may have to wait for blanking and a move of a mega-pixel or two to do a scroll. > >Perceived slowness is also important. -- Nick Ing-Simmons <[EMAIL PROTECTED]> Via, but not speaking for: Texas Instruments Ltd.
Re: mixed numeric and string SVs.
David Mitchell <[EMAIL PROTECTED]> writes: >Has anyone given thought to how an SV can contain both a numeric value >and string value in Perl6? >Given the arbitrary number of numeric and string types that the vatble >scheme of Perl6 support it will be unviable to to have special types >for all permuations (eg, utf8_nv, unicode32_iv, ascii_bitint, ad nauseum). > >It seems to me the following options are poossible: > >1. We no longer save conversions, so > $i="3"; $j+=$i for (...); >does an aton() or similar each time round the loop Well just the 1st time - then it is a number... > >2. Each SV has 2 vtable pointers - one for it's numeric representation >(if any), and one for its string represenation (if any). Flexible, but >may require an extra 4/8 bytes per SV. This is my favourite. > >3. We decree that all string to numeric conversions should return >a particular numeric type (eg NV), and that all numeric to string >conversions should similary convert to a fixed string type (eg utf8). >(Although I'm not sure that really helps.) I can't see how that helps. -- Nick Ing-Simmons <[EMAIL PROTECTED]> Via, but not speaking for: Texas Instruments Ltd.
Re: mixed numeric and string SVs.
> >1. We no longer save conversions, so > > $i="3"; $j+=$i for (...); > >does an aton() or similar each time round the loop > > Well just the 1st time - then it is a number... Err, option (1) was explicity suggesting we *dont* save the result of the conversion, so aton() *would* have to be called each time. (I didnt think this was sensible, I was just suggesting it for completeness...)
Expressions and binding operator
perlop: >Binary ``=~'' binds a scalar expression to a pattern match. [...] The >right argument is a search pattern, substitution, or transliteration. [...] > >If the right argument is an expression rather than a search pattern, >substitution, or >transliteration, it is interpreted as a search pattern at run time. Should this second paragraph still be true for Perl 6? I have at times wanted to do something of the form perl -lwe '$x = "x"; $y = "y"; $y =~ ($x eq "x" ? s/y/z/ : s/y/a/); print $y' but I have not wanted to make the right argument an expression to be interpreted as a search pattern (since I have qr//). -- Peter Scott Pacific Systems Design Technologies
Re: String representation
David Mitchell <[EMAIL PROTECTED]> writes: >The problem is "what are the (types of) the arguments passed > >I dont really see why types af args are (in general) a problem. Hmm, you may be right at the level of your example, which may indeed be typical of pp_(). Perhaps PerlIO is so bother some because it is lower level. If all the args are SV * (or whatver perl6 calls it) then there is no big deal. >pp_concat() { > SV *sv1, *sv2; > sv1 = POP; sv2 = POP; > sv1->concat(sv2); >} > >the_type_of_sv1_concat(SV *sv1, *sv2) { > if (sv1->vtable == sv2->table) { > // both args are of this type: > // dive into the internals and do an efficient concat > sv1 = ; > } else { > generic_concat(sv1,sv2); > } >} The two "snags" with that are (and they may not be important): 1. One point of the vtable scheme was to avoid conditionals following memory fetches. 2. The else branch may be very common, so we just added a function call and a test to the "normal" case. The snag is that there are common pairs e.g. concat(utf8,ascii) / concat(ascii,utf8) or plus(NV,IV) / plus(IV,NV) where it is possible to get "smart" when one arg is a "special case" of the other. >> True, but the messy details would now occur multiple times, >> as soon as substr_utf8 exists then _ALL_ the other string ops >> _must_ be overridden as well because nothing but string_utf8 "class" >> knows what is going on. > >perhaps I'm being dim, but I dont really follow this. At the minimum, >someone writes a generic substr function that works with any string types. >Perhaps it achieves this by first converting all its args to UNICODE-32. So we presume that all string types MUST (in standards-ese) support a ->toUNICODE32() method? And similarly numbers must be convertable to "complex long double" or what ever is the top if the built-in tree ? (NV I guess - complex is over-kill.) It is the how do we do the generic case that worries me. >Not very efficient or desirable, but it gets you there. >Then the implementor of the utf8 code writes a substr_utf8 function that only >knows how to cope if all its args are utf8. If not, it just >hands the call on to the generic sub. I see that now it is more flexible that what we "sort of have" to-date which is like: >pp_concat() { > SV *sv1, *sv2; > sv1 = POP; sv2 = POP; if (sv1->vtable == sv2->vtable) > sv1->concat(sv2); else generic_concat(sv1,sv2); } Or perhaps : >pp_concat() { > SV *sv1, *sv2; > sv1 = POP; sv2 = POP; if (sv1->knows_about(sv2->vtable)) > sv1->concat(sv2); else generic_concat(sv1,sv2); } Both of which either pre-judge waht is allowed or put cost on the front of the simple case. >> >In fact, I would argue that in general most if not all the operations >currently >> >performed by pp_* should have vtable equivalents, both for numeric and string >> >types (including unary ops, mutators, binops etc etc). >> >> Hmm - that is indeed a logical position. > >logical as in "consistent" or logical as in "sensible" ??? :-) Consistent. Consitent is usually sensible though. > >I was under the impression that it was pretty much agreed for numeric >types that each SV type would have its own set of binary ops (eg add, sub >etc), so I wasnt aware I proposing anything radical! We have not solved (or I have not spotted) what you do when you get IV * NV. Something has to "know" to "upgrade" IV -> NV and call NV's '*' - which does not scale well. >I can't see why you get a code explosion. In perl5 you get the explosion - >every part of perl needs to know about every SV type, and introducing a new >type or subtype involves hacking in just about every nook and cranny within >perl. >If there was a bug in the + operator, it would be apparent fairly quickly >where it lies (eg int+int and num+num gives right result, >int+num goes wrong; therefore the Int->add[NUM]() function is suspect.) All true. The explosion is that you have Int->add(Int) Int->add(Num) Int->add(Rational) Int->add(Monetarty) Int->add(NumComplex) Int->add(IntComplex) Int->add(RationalComplex) Int->add(InternationalMonetary) Int->add(String); # containing any of above Int->add(OverloadedObject) Or if Int doesn't do that then generic_add() has to. etc. - perl5 we have IV,(UV),NV so the problem is bounded. >> In other words - string ops on strings of uniform type, math ops on >> well understood hierachies etc. are all easy enough - it is the >> combinations that get very messy very very quickly. > >I couldnt agree more - however, I think that issue is mostly orthogonal >to whether most pp_ functions should have vtable equivalents. If the >functionality is built dirrectly into pp_XXX, you still have a combinatorial >mess to cope with - hiving off into vtables *may* reduce the mess, or >*mi
Re: Expressions and binding operator
On Wed, Dec 20, 2000 at 03:36:48PM -0800, Peter Scott wrote: > Should this second paragraph still be true for Perl 6? I have at times > wanted to do something of the form > > perl -lwe '$x = "x"; $y = "y"; $y =~ ($x eq "x" ? s/y/z/ : s/y/a/); print $y' > > but I have not wanted to make the right argument an expression to be > interpreted as a search pattern (since I have qr//). I presume that you don't find perl -lwe '$x = "x"; $y = "y"; $x eq "x" ? $y =~ s/y/z/ : $y =~ s/y/a/; print $y' does what you need because you actually want to do something a lot more complex than simple "$y =~" in your expression. Or do I guess wrong? Nicholas Clark
Re: Expressions and binding operator
At 11:39 PM 12/20/00 +, Nicholas Clark wrote: >On Wed, Dec 20, 2000 at 03:36:48PM -0800, Peter Scott wrote: > > Should this second paragraph still be true for Perl 6? I have at times > > wanted to do something of the form > > > > perl -lwe '$x = "x"; $y = "y"; $y =~ ($x eq "x" ? s/y/z/ : s/y/a/); > print $y' > > > > but I have not wanted to make the right argument an expression to be > > interpreted as a search pattern (since I have qr//). > >I presume that you don't find > >perl -lwe '$x = "x"; $y = "y"; $x eq "x" ? $y =~ s/y/z/ : $y =~ s/y/a/; >print $y' > >does what you need because you actually want to do something a lot more >complex than simple "$y =~" in your expression. >Or do I guess wrong? Oh, that certainly works, and I wouldn't have any complaint on grounds of brevity alone unless I wanted an lvalue expression like ($x ? $y : $z) instead of $y. But since this is a Perl 6 list, I'm making an inquiry on syntactical convenience grounds. What I wanted to write *feels* Perlish. -- Peter Scott Pacific Systems Design Technologies