Re: Bit ops on strings

Jarkko Hietaniemi Sat, 01 May 2004 08:26:40 -0700

> 
> So it seems to me that the "obvious" way to go is to have all bit-s
> operations first convert to raw bytes (possibly throwing an exception)
> and then proceed to do their work.


If these conversions croak if there are code points beyond \x{ff}, I'm
fine with it.  But trying to mix \x{100} or higher just leads into silly
discontinuities (basically we would need to decide on a word width, and
I think that would be a silly move).

> This means that UTF-8 strings will be handled just fine, and (as I

Please don't mix encodings and code points.  That strings might be
serialized or stored as UTF-8 should have no consequence with bitops.

> understand it) some subset of Unicode-at-large will be handled as well.
> In other-words, the burden goes on the conversion functions, not on the
> bit ops.
> 
> It's not that it's going to be meaningful in the general case, but if

I'd rather have meaningful results.

> you have code like:
> 
>       sub foo() { return "\x01"+|"\x02" }

Please consider what happens when the operands have code points beyond 0xff.

> I would expect the get the bit-string, "\x03" back even though strings
> may default to Unicode in Perl 6.

Of course.  But I would expect a horrible flaming death for
"\x{100}"|+"\x02".

> You could put this on the shoulders of the client language (by saying
> that the operands must be pre-converted, but that seems to be contrary
> to Parrot's usual MO.
> 
> Let me know. I'm happy to do it either way, and I'll look at modifying
> the other bit-string operators if they don't conform to the decision.
> 


-- 
Jarkko Hietaniemi <[EMAIL PROTECTED]> http://www.iki.fi/jhi/ "There is this special
biologist word we use for 'stable'.  It is 'dead'." -- Jack Cohen

Re: Bit ops on strings

Reply via email to