On Fri, Apr 30, 2004 at 08:59:55AM -0700, Jeff Clites wrote:
: >I can't imagine that
: >we use a different data type, this would totally mess with Perl
: >compatibility.
: 
: Not necessarily (or, that wasn't my intention). For Ponie, we can do 
: this:

Anded or ored?

: 1) Just always implicitly assume "iso-8859-1" when creating strings 
: which Perl5 would have interpreted as binary.

Well, that's what we initially tried to do in Perl 5, but it turned
out to break a lot of programs.  Whether Ponie wants to break those
programs is another matter.

: 2) To handle certain features of Perl5 semantics, we could set a flag, 
: at the PerlString level, to indicate that it should have Perl5-ish 
: semantics. (That depends on wether a string created in Perl5 code and 
: passed to Perl6 code should act Perl5-ish or Perl6-ish there. That is, 
: is its semantics set by its creation context or its use context.) See 
: below for an example of a case I'm thinking where the semantics might 
: differ:

I don't think we want to import Perl 5 semantics (or lack thereof) into
Perl 6.  Ponie could at least mark strings from a raw filehandle as
"presumed binary" for Perl 6, even if Ponie ignores the distinction
for the sake of backward compatibility.  But I'd rather break the
interfaces between Ponie and Perl 6 occasionally than preserve Perl
5's inconsistent semantics in Perl 6.  Perhaps type declarations
on the Perl 6 end can keep things sane at the interface.

: >We must ensure that such a string is never upscaled to another string
: >representation. We can do all byte-wise operations on such a string, 
: >but
: >e.g. appending an utfX string or such should be an error.
: 
: Although, Perl5 lets you append a "utf-8" string to a "binary" string. 
: But the behavior is odd. For instance, consider this Perl5 behavior 
: (not sure if it's a feature or a bug):

Well, a bug is just a feature you intend to get rid of.  :-)

: $a = chr(0xC8);
: $b = substr($a.chr(0x212b), 0, 1); # append a "utf-8" character, then 
: pull it off
: 
: print $a; # these print....
: print $b; # ...the same thing
: 
: print lc($a); # these print...
: print lc($b); # ...different things
: 
: if( $a eq $b ) { print "yes" } # this prints yes
: 
: So, in Perl5, not only does the behavior of a (non-utf-8?) string 
: change if it "touches" something utf-8-ish, but it does this despite 
: "eq" telling us the strings are the same. (And, since lc() has no 
: effect on $a, the implication is that the string is sort of 
: half-ASCII-half-binary; that is, case mapping has not effect on 
: characters > 127, which implies they are somehow "uninterpreted"?)
: 
: But this behavior could be accommodated (if it's not a bug) at the 
: PerlString level by special-casing the relevant operations for the 
: Ponie case.

It's a feature we don't intend to propagate.  :-)

: >The main problem currently seems to be IO, where the best thing would 
: >be
: >to move the current hacks into a separate layer above the buffered
: >layer. An additiional parameter for open (or layer manipulation
: >features) can select byte-wise IO.
: 
: Yes, my intention there was for read-as-strings, you'd push a 
: string-ification layer onto the stack. For byte-wise IO, you wouldn't.

Actually, if I recall, the :raw layer in Perl 5 ends up popping off the
default string layer.  But the effect is presumably the same.

Larry

Reply via email to