On Sat, 2004-05-01 at 04:57, Jarkko Hietaniemi wrote: > > If Jarkko > > tells me you can do bitwise operations with unicode text now in Perl > > 5, well... we'll support it there, too, though we shan't like it at > > all. > > We can and I don't like it at all [...] > None of it anything I want to propagate anywhere.
Please correct me if I'm wrong here, but I'm going to lay out my understanding as a set of assertions: * Parrot will be able to convert any encoding to any other encoding * though, some conversions will result in an exception, that's still a defined behavior * We've agreed that only raw binary 8-bit strings make sense for bit vector operations So it seems to me that the "obvious" way to go is to have all bit-s operations first convert to raw bytes (possibly throwing an exception) and then proceed to do their work. This means that UTF-8 strings will be handled just fine, and (as I understand it) some subset of Unicode-at-large will be handled as well. In other-words, the burden goes on the conversion functions, not on the bit ops. It's not that it's going to be meaningful in the general case, but if you have code like: sub foo() { return "\x01"+|"\x02" } I would expect the get the bit-string, "\x03" back even though strings may default to Unicode in Perl 6. You could put this on the shoulders of the client language (by saying that the operands must be pre-converted, but that seems to be contrary to Parrot's usual MO. Let me know. I'm happy to do it either way, and I'll look at modifying the other bit-string operators if they don't conform to the decision. -- Aaron Sherman <[EMAIL PROTECTED]> Senior Systems Engineer and Toolsmith "It's the sound of a satellite saying, 'get me down!'" -Shriekback
signature.asc
Description: This is a digitally signed message part