Re: Bit ops on strings

Aaron Sherman Sat, 01 May 2004 12:00:58 -0700

On Sat, 2004-05-01 at 14:18, Jeff Clites wrote:
> On May 1, 2004, at 8:26 AM, Jarkko Hietaniemi wrote:


> Just FYI, the way I implemented bitwise-not so far, was to bitwise-not 
> code points 0x{00}-0x{FF} as uint8-sized things, 0x{100}-0x{FFFF} as 
> uint16-sized things, and > 0x{FFFF} as uint32-sized things (but then 
> bit-masking them with 0xFFFFF to make sure that they fell into a valid 
> code point range). That's pretty arbitrary, but if you bitwise-not as 
> though everything were 32-bits wide, you'll end up with a "string" 
> containing no assigned code points at all (they'll all be > 0x10FFFFF). 
> But from a text point of view, bitwise-not on a string isn't a sensible 
> operation no matter how you slice it (that is, even for 0x{00}-0x{FF}), 
> so one flavor of arbitrary is just about as good as any other. We could 
> also make anything > 0x{FF} map to either 0x{00} or 0x{FF}, or mask if 
> with 0xFF to push it into that range. It's all pretty meaningless, as 
> text transformations go, and I can't imagine anyone using it for 
> anything, except maybe weak encryption.

I think Dan and I were both thinking in terms of bit-vector operations
on byte-streams for any purpose that would require such a beast. In
Perl, you have the vec function to make this slightly easier.

This is one of those places where thinking about strings as text is
highly misleading. They're used for an awful lot more.

> Exactly. And also realize that if you bitwise-not (or shift or 
> something similar) the bytes of a UTF-8 serialization of something, the 
> result isn't going to be valid UTF-8, so you'd be hard-pressed to lay 
> text semantics down on top of it.

How are you defining "valid UTF-8"? Is there a codepoint in UTF-8
between \x00 and \xff that isn't valid? Is there a reason to ever do
bitwise operations on anything other than 8-bit codepoints?

> I'm beginning to wonder if we're going to be square-rooting strings, 
> and taking the array-th root of a hash.... :)

Strings are not numbers, but there's a heck of a lot of code out there
that treats existing strings as bit-vectors (note: bit vectors are not
numbers either), and that code needs to be supported, no?

Now, shift operations aren't usually part of the package, but I figured
that as long as we were going to have the rest of the bit-manipulators,
finishing off the set would be of value.

More to the point, I said all of this at the beginning of this thread.
You should not, at this point, be confused about the scope of what I
want to do, as it was very narrowly and clearly defined up-front.

-- 
Aaron Sherman <[EMAIL PROTECTED]>
Senior Systems Engineer and Toolsmith
"It's the sound of a satellite saying, 'get me down!'" -Shriekback

signature.asc
Description: This is a digitally signed message part

Re: Bit ops on strings

Reply via email to