On Apr 28, 2004, at 11:25 PM, Leopold Toetsch wrote:
Jeff Clites <[EMAIL PROTECTED]> wrote:On Apr 28, 2004, at 4:57 AM, Bryan C. Warnock wrote:
Does (that which the masses normally refer to as) binary data fall inside or outside the scope of a string?
Some languages make this very clear by providing a separate data type to hold a "blob of bytes".
Back to Parrot, which isn't covered by the manifesto. But anyway we already need[1] "enum_stringrep_blob" or "_bytes".
Certainly, for the things you've listed under [1] there's no problem with using a separate data type.
I can't imagine that we use a different data type, this would totally mess with Perl compatibility.
Not necessarily (or, that wasn't my intention). For Ponie, we can do this:
1) Just always implicitly assume "iso-8859-1" when creating strings which Perl5 would have interpreted as binary.
2) To handle certain features of Perl5 semantics, we could set a flag, at the PerlString level, to indicate that it should have Perl5-ish semantics. (That depends on wether a string created in Perl5 code and passed to Perl6 code should act Perl5-ish or Perl6-ish there. That is, is its semantics set by its creation context or its use context.) See below for an example of a case I'm thinking where the semantics might differ:
We must ensure that such a string is never upscaled to another string
representation. We can do all byte-wise operations on such a string, but
e.g. appending an utfX string or such should be an error.
Although, Perl5 lets you append a "utf-8" string to a "binary" string. But the behavior is odd. For instance, consider this Perl5 behavior (not sure if it's a feature or a bug):
$a = chr(0xC8);
$b = substr($a.chr(0x212b), 0, 1); # append a "utf-8" character, then pull it off
print $a; # these print.... print $b; # ...the same thing
print lc($a); # these print... print lc($b); # ...different things
if( $a eq $b ) { print "yes" } # this prints yes
So, in Perl5, not only does the behavior of a (non-utf-8?) string change if it "touches" something utf-8-ish, but it does this despite "eq" telling us the strings are the same. (And, since lc() has no effect on $a, the implication is that the string is sort of half-ASCII-half-binary; that is, case mapping has not effect on characters > 127, which implies they are somehow "uninterpreted"?)
But this behavior could be accommodated (if it's not a bug) at the PerlString level by special-casing the relevant operations for the Ponie case.
The main problem currently seems to be IO, where the best thing would be
to move the current hacks into a separate layer above the buffered
layer. An additiional parameter for open (or layer manipulation
features) can select byte-wise IO.
Yes, my intention there was for read-as-strings, you'd push a string-ification layer onto the stack. For byte-wise IO, you wouldn't.
[1] - transparent IO e.g. $ parrot md5sum.imc a.out - freeze/thaw - writing packfiles from PASM
JEff