> > So it seems to me that the "obvious" way to go is to have all bit-s > operations first convert to raw bytes (possibly throwing an exception) > and then proceed to do their work.
If these conversions croak if there are code points beyond \x{ff}, I'm fine with it. But trying to mix \x{100} or higher just leads into silly discontinuities (basically we would need to decide on a word width, and I think that would be a silly move). > This means that UTF-8 strings will be handled just fine, and (as I Please don't mix encodings and code points. That strings might be serialized or stored as UTF-8 should have no consequence with bitops. > understand it) some subset of Unicode-at-large will be handled as well. > In other-words, the burden goes on the conversion functions, not on the > bit ops. > > It's not that it's going to be meaningful in the general case, but if I'd rather have meaningful results. > you have code like: > > sub foo() { return "\x01"+|"\x02" } Please consider what happens when the operands have code points beyond 0xff. > I would expect the get the bit-string, "\x03" back even though strings > may default to Unicode in Perl 6. Of course. But I would expect a horrible flaming death for "\x{100}"|+"\x02". > You could put this on the shoulders of the client language (by saying > that the operands must be pre-converted, but that seems to be contrary > to Parrot's usual MO. > > Let me know. I'm happy to do it either way, and I'll look at modifying > the other bit-string operators if they don't conform to the decision. > -- Jarkko Hietaniemi <[EMAIL PROTECTED]> http://www.iki.fi/jhi/ "There is this special biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen