On Fri, Apr 30, 2004 at 08:59:55AM -0700, Jeff Clites wrote: : >I can't imagine that : >we use a different data type, this would totally mess with Perl : >compatibility. : : Not necessarily (or, that wasn't my intention). For Ponie, we can do : this:
Anded or ored? : 1) Just always implicitly assume "iso-8859-1" when creating strings : which Perl5 would have interpreted as binary. Well, that's what we initially tried to do in Perl 5, but it turned out to break a lot of programs. Whether Ponie wants to break those programs is another matter. : 2) To handle certain features of Perl5 semantics, we could set a flag, : at the PerlString level, to indicate that it should have Perl5-ish : semantics. (That depends on wether a string created in Perl5 code and : passed to Perl6 code should act Perl5-ish or Perl6-ish there. That is, : is its semantics set by its creation context or its use context.) See : below for an example of a case I'm thinking where the semantics might : differ: I don't think we want to import Perl 5 semantics (or lack thereof) into Perl 6. Ponie could at least mark strings from a raw filehandle as "presumed binary" for Perl 6, even if Ponie ignores the distinction for the sake of backward compatibility. But I'd rather break the interfaces between Ponie and Perl 6 occasionally than preserve Perl 5's inconsistent semantics in Perl 6. Perhaps type declarations on the Perl 6 end can keep things sane at the interface. : >We must ensure that such a string is never upscaled to another string : >representation. We can do all byte-wise operations on such a string, : >but : >e.g. appending an utfX string or such should be an error. : : Although, Perl5 lets you append a "utf-8" string to a "binary" string. : But the behavior is odd. For instance, consider this Perl5 behavior : (not sure if it's a feature or a bug): Well, a bug is just a feature you intend to get rid of. :-) : $a = chr(0xC8); : $b = substr($a.chr(0x212b), 0, 1); # append a "utf-8" character, then : pull it off : : print $a; # these print.... : print $b; # ...the same thing : : print lc($a); # these print... : print lc($b); # ...different things : : if( $a eq $b ) { print "yes" } # this prints yes : : So, in Perl5, not only does the behavior of a (non-utf-8?) string : change if it "touches" something utf-8-ish, but it does this despite : "eq" telling us the strings are the same. (And, since lc() has no : effect on $a, the implication is that the string is sort of : half-ASCII-half-binary; that is, case mapping has not effect on : characters > 127, which implies they are somehow "uninterpreted"?) : : But this behavior could be accommodated (if it's not a bug) at the : PerlString level by special-casing the relevant operations for the : Ponie case. It's a feature we don't intend to propagate. :-) : >The main problem currently seems to be IO, where the best thing would : >be : >to move the current hacks into a separate layer above the buffered : >layer. An additiional parameter for open (or layer manipulation : >features) can select byte-wise IO. : : Yes, my intention there was for read-as-strings, you'd push a : string-ification layer onto the stack. For byte-wise IO, you wouldn't. Actually, if I recall, the :raw layer in Perl 5 ends up popping off the default string layer. But the effect is presumably the same. Larry