Re: Bit ops on strings

Aaron Sherman Sat, 01 May 2004 07:03:15 -0700

On Sat, 2004-05-01 at 04:57, Jarkko Hietaniemi wrote:
> > If Jarkko 
> > tells me you can do bitwise operations with unicode text now in Perl 
> > 5, well... we'll support it there, too, though we shan't like it at 
> > all.
> 
> We can and I don't like it at all [...]
> None of it anything I want to propagate anywhere.


Please correct me if I'm wrong here, but I'm going to lay out my
understanding as a set of assertions:

      * Parrot will be able to convert any encoding to any other
        encoding
      * though, some conversions will result in an exception, that's
        still a defined behavior
      * We've agreed that only raw binary 8-bit strings make sense for
        bit vector operations

So it seems to me that the "obvious" way to go is to have all bit-s
operations first convert to raw bytes (possibly throwing an exception)
and then proceed to do their work.

This means that UTF-8 strings will be handled just fine, and (as I
understand it) some subset of Unicode-at-large will be handled as well.
In other-words, the burden goes on the conversion functions, not on the
bit ops.

It's not that it's going to be meaningful in the general case, but if
you have code like:

        sub foo() { return "\x01"+|"\x02" }

I would expect the get the bit-string, "\x03" back even though strings
may default to Unicode in Perl 6.

You could put this on the shoulders of the client language (by saying
that the operands must be pre-converted, but that seems to be contrary
to Parrot's usual MO.

Let me know. I'm happy to do it either way, and I'll look at modifying
the other bit-string operators if they don't conform to the decision.

-- 
Aaron Sherman <[EMAIL PROTECTED]>
Senior Systems Engineer and Toolsmith
"It's the sound of a satellite saying, 'get me down!'" -Shriekback

signature.asc
Description: This is a digitally signed message part

Re: Bit ops on strings

Reply via email to