Re: Bit ops on strings

2004-05-26 Thread Jeff Clites
On May 26, 2004, at 2:02 AM, Nicholas Clark wrote: On Tue, May 25, 2004 at 07:48:32PM -0700, Jeff Clites wrote: On May 25, 2004, at 12:26 PM, Dan Sugalski wrote: Yup. UTF8 is Just another variable-width encoding. Do anything with it and we convert it to a fixed-width encoding, in this case UTF32.

Re: Bit ops on strings

2004-05-26 Thread Nicholas Clark
On Tue, May 25, 2004 at 07:48:32PM -0700, Jeff Clites wrote: > On May 25, 2004, at 12:26 PM, Dan Sugalski wrote: > >Yup. UTF8 is Just another variable-width encoding. Do anything with it > >and we convert it to a fixed-width encoding, in this case UTF32. > > This has the unfortunate side-effect

Re: Bit ops on strings

2004-05-25 Thread Jeff Clites
On May 25, 2004, at 12:26 PM, Dan Sugalski wrote: At 12:30 PM +0100 5/25/04, Nicholas Clark wrote: I may be misremembering what I've read here but I thought that Dan said that for variable length encodings (such as shift-JIS) parrot would store the byte(s) in memory in constant size 16 or 32 bit

Re: Bit ops on strings

2004-05-25 Thread Dan Sugalski
At 8:30 PM +0100 5/25/04, Nicholas Clark wrote: On Tue, May 25, 2004 at 03:26:45PM -0400, Dan Sugalski wrote: Yup. UTF8 is Just another variable-width encoding. Do anything with it and we convert it to a fixed-width encoding, in this case UTF32. Does this mean that we won't be verifying the valid

Re: Bit ops on strings

2004-05-25 Thread Nicholas Clark
On Tue, May 25, 2004 at 03:26:45PM -0400, Dan Sugalski wrote: > Yup. UTF8 is Just another variable-width encoding. Do anything with > it and we convert it to a fixed-width encoding, in this case UTF32. Does this mean that we won't be verifying the validity of UTF8 on input? And instead pitching

Re: Bit ops on strings

2004-05-25 Thread Dan Sugalski
At 12:30 PM +0100 5/25/04, Nicholas Clark wrote: I may be misremembering what I've read here but I thought that Dan said that for variable length encodings (such as shift-JIS) parrot would store the byte(s) in memory in constant size 16 or 32 bit integers, rather than the (external) variable length

Re: Bit ops on strings

2004-05-25 Thread Nicholas Clark
On Sun, May 02, 2004 at 11:37:31AM -0700, Jeff Clites wrote: > Two more things to keep in mind: > > On May 1, 2004, at 4:54 PM, Aaron Sherman wrote: > > >If Perl defaults to UTF-8 > > People need to realize also that although UTF-8 is a pretty good > interchange format, it's a really bad in-mem

Re: Bit ops on strings

2004-05-02 Thread Jeff Clites
Two more things to keep in mind: On May 1, 2004, at 4:54 PM, Aaron Sherman wrote: If Perl defaults to UTF-8 People need to realize also that although UTF-8 is a pretty good interchange format, it's a really bad in-memory representation. This is for at least 2 related reasons: (1) To get to the N-

Re: Bit ops on strings

2004-05-02 Thread Jarkko Hietaniemi
>> >>I am very confused. THIS IS WHAT WE ALL SEEM TO BE SAYING. BITOPS ONLY >>ON EIGHT-BIT DATA. AM I WRONG? > > > No, it's not, and could you please not get emotional about this? It's I apologize for using UPPERCASE. My only excuse is that it was not personally aimed at you: I have been gri

Re: Bit ops on strings

2004-05-01 Thread Aaron Sherman
On Sat, 2004-05-01 at 15:09, Jarkko Hietaniemi wrote: > > How are you defining "valid UTF-8"? Is there a codepoint in UTF-8 > > between \x00 and \xff that isn't valid? Is there a reason to ever do > > Like, half of them? \x80 .. \xff are all invalid as UTF-8. Heh, damn Ken Thompson and his place

Re: Bit ops on strings

2004-05-01 Thread Andrew E Switala
It's been said that what the "masses" think of as binary data is outside the concept of a string, and this lurker just don't see that. A binary string is string over a character set of size two, just like an ASCII string is a string over a character set of size 128. [Like character strings, so-ca

Re: Bit ops on strings

2004-05-01 Thread Jeff Clites
On May 1, 2004, at 12:00 PM, Aaron Sherman wrote: On Sat, 2004-05-01 at 14:18, Jeff Clites wrote: Exactly. And also realize that if you bitwise-not (or shift or something similar) the bytes of a UTF-8 serialization of something, the result isn't going to be valid UTF-8, so you'd be hard-pressed to

Re: Bit ops on strings

2004-05-01 Thread Jarkko Hietaniemi
> How are you defining "valid UTF-8"? Is there a codepoint in UTF-8 > between \x00 and \xff that isn't valid? Is there a reason to ever do Like, half of them? \x80 .. \xff are all invalid as UTF-8. > bitwise operations on anything other than 8-bit codepoints? I am very confused. THIS IS WHAT W

Re: Bit ops on strings

2004-05-01 Thread Aaron Sherman
On Sat, 2004-05-01 at 14:18, Jeff Clites wrote: > On May 1, 2004, at 8:26 AM, Jarkko Hietaniemi wrote: > Just FYI, the way I implemented bitwise-not so far, was to bitwise-not > code points 0x{00}-0x{FF} as uint8-sized things, 0x{100}-0x{} as > uint16-sized things, and > 0x{} as uint32-s

Re: Bit ops on strings

2004-05-01 Thread Jeff Clites
On May 1, 2004, at 8:26 AM, Jarkko Hietaniemi wrote: So it seems to me that the "obvious" way to go is to have all bit-s operations first convert to raw bytes (possibly throwing an exception) and then proceed to do their work. If these conversions croak if there are code points beyond \x{ff}, I'm f

Re: Bit ops on strings

2004-05-01 Thread Aaron Sherman
On Sat, 2004-05-01 at 11:26, Jarkko Hietaniemi wrote: As for codepoints outside of \x00-\xff, I vote exception. I don't think there's any other logical choice, but I think it's just an encoding conversion exception, not a special bit-op exception (that's arm-waving, I have not looked at Parrot's e

Re: Bit ops on strings

2004-05-01 Thread Jarkko Hietaniemi
> > So it seems to me that the "obvious" way to go is to have all bit-s > operations first convert to raw bytes (possibly throwing an exception) > and then proceed to do their work. If these conversions croak if there are code points beyond \x{ff}, I'm fine with it. But trying to mix \x{100} or

Re: Bit ops on strings

2004-05-01 Thread Aaron Sherman
On Sat, 2004-05-01 at 04:57, Jarkko Hietaniemi wrote: > > If Jarkko > > tells me you can do bitwise operations with unicode text now in Perl > > 5, well... we'll support it there, too, though we shan't like it at > > all. > > We can and I don't like it at all [...] > None of it anything I want

RE: Bit ops on strings

2004-05-01 Thread Bryan C. Warnock
On Fri, 2004-04-30 at 15:34, Dan Sugalski wrote: > If you want, you could think of the S-register strings as mini-PMCs. > The encoding and charset stuff (we'll ignore language semantics for > the moment) are essentially small vtables that hang off the string, > and whatever we do with it mostly

RE: Bit ops on strings

2004-05-01 Thread Bryan C. Warnock
On Fri, 2004-04-30 at 13:53, Dan Sugalski wrote: > Parrot, at the very low levels, makes no distinction between strings > and buffers--as far as it's concerned they're the same thing, and > either can hang off an S register. (Ultimately, when *I* talk of > strings I mean "A thing I can hang off

Re: Bit ops on strings

2004-05-01 Thread Jarkko Hietaniemi
> > The bitshift operations on S-register contents are valid, so long as > the thing hanging off the register support it. Binary data ought > allow this. Most 8-bit string encodings will have to support it > whether it's a good idea or not, since you can do it now. If Jarkko > tells me you can

Re: Bit ops on strings

2004-04-30 Thread Dan Sugalski
At 7:07 PM -0700 4/30/04, Jeff Clites wrote: On Apr 30, 2004, at 10:22 AM, Dan Sugalski wrote: At 2:57 AM +1000 5/1/04, Andre Pang wrote: Of course Parrot should have a function to reinterpret something of a string type as raw binary data and vice versa, but don't mix binary data with strings: th

Re: Bit ops on strings

2004-04-30 Thread Jeff Clites
On Apr 30, 2004, at 10:22 AM, Dan Sugalski wrote: At 2:57 AM +1000 5/1/04, Andre Pang wrote: Of course Parrot should have a function to reinterpret something of a string type as raw binary data and vice versa, but don't mix binary data with strings: they are completely different types, and raw b

Re: Bit ops on strings

2004-04-30 Thread Leopold Toetsch
Dan Sugalski <[EMAIL PROTECTED]> wrote: > If you want, you could think of the S-register strings as mini-PMCs. > The encoding and charset stuff (we'll ignore language semantics for > the moment) are essentially small vtables that hang off the string, I think its the cleanest way of implementing a

RE: Bit ops on strings

2004-04-30 Thread Dan Sugalski
At 4:15 PM -0400 4/30/04, Bryan C. Warnock wrote: On Fri, 2004-04-30 at 15:34, Dan Sugalski wrote: If you want, you could think of the S-register strings as mini-PMCs. The encoding and charset stuff (we'll ignore language semantics for the moment) are essentially small vtables that hang off the

RE: Bit ops on strings

2004-04-30 Thread Dan Sugalski
At 2:58 PM -0400 4/30/04, Bryan C. Warnock wrote: On Fri, 2004-04-30 at 13:53, Dan Sugalski wrote: Parrot, at the very low levels, makes no distinction between strings and buffers--as far as it's concerned they're the same thing, and either can hang off an S register. (Ultimately, when *I* talk

RE: Bit ops on strings

2004-04-30 Thread Dan Sugalski
At 12:18 PM -0400 4/30/04, Butler, Gerald wrote: A string is what Dan described in his various postings on strings. Nuff said. Gerald Butler responds: Yes, I know a "String" is what Dan described. He described a thingy made up of 32-bit Values where each value represented a "Code-Point". Now,

RE: Bit ops on strings

2004-04-30 Thread Butler, Gerald
-Original Message- From: Aaron Sherman [mailto:[EMAIL PROTECTED] Sent: Friday, April 30, 2004 11:58 AM To: Butler, Gerald Cc: Perl6 Internals List Subject: RE: Bit ops on strings On Fri, 2004-04-30 at 09:47, Butler, Gerald wrote: > If I may interject for a moment: Let me start

Re: Bit ops on strings

2004-04-30 Thread Dan Sugalski
At 2:57 AM +1000 5/1/04, Andre Pang wrote: Of course Parrot should have a function to reinterpret something of a string type as raw binary data and vice versa, but don't mix binary data with strings: they are completely different types, and raw binary data should never be able to be put into a s

Re: Bit ops on strings

2004-04-30 Thread Andre Pang
On 30/04/2004, at 11:47 PM, Butler, Gerald wrote: 1. String - low-level, abstract, base class (or in Perl6 terms role -- I think) which represents a "logically" contiguous series of Parrot Int 2. BinaryString - inherits from String, represents a "logically" contiguous series of "byt

RE: Bit ops on strings

2004-04-30 Thread Aaron Sherman
On Fri, 2004-04-30 at 12:18, Butler, Gerald wrote: > Now, we > have people talking about doing "LSL/LSR" on "Strings". That is 100% > inconsistent with that definition of a "String". Not at all, and keep in mind that I didn't propose this out of the blue. "bands", "bxors" and "bors" are existing

RE: Bit ops on strings

2004-04-30 Thread Aaron Sherman
On Fri, 2004-04-30 at 09:47, Butler, Gerald wrote: > If I may interject for a moment: Let me start by saying that I have not drunk the Unicode cool-aid. I'm not at all certain that the overhead required to do all of what Parrot wants to do is warranted, BUT that's beside the point. Parrot is doin

Re: Bit ops on strings

2004-04-30 Thread Aaron Sherman
On Fri, 2004-04-30 at 10:42, Dan Sugalski wrote: > Bitstring operations ought only be valid on binary data, though, > unless someone can give me a good reason why we ought to allow > bitshifting on Unicode. (And then give me a reasoned argument *how*, > too) 100% agree. If you want to play gam

RE: Bit ops on strings

2004-04-30 Thread Butler, Gerald
If I may interject for a moment: -Original Message- From: Bryan C. Warnock [mailto:[EMAIL PROTECTED] Sent: Friday, April 30, 2004 9:08 AM To: Dan Sugalski Cc: Perl6 Internals List Subject: Re: Bit ops on strings On Thu, 2004-04-29 at 13:04, Dan Sugalski wrote: > I think left and ri

Re: Bit ops on strings

2004-04-30 Thread Dan Sugalski
At 9:07 AM -0400 4/30/04, Bryan C. Warnock wrote: On Thu, 2004-04-29 at 13:04, Dan Sugalski wrote: I think left and right shift of strings should work the same way that shifts on ints works--that is, it doesn't grow, bits just fall off the end. You can decide whether to sign-extend or 0-extend,

Re: Bit ops on strings

2004-04-30 Thread Bryan C. Warnock
On Thu, 2004-04-29 at 13:04, Dan Sugalski wrote: > I think left and right shift of strings should work the same way that > shifts on ints works--that is, it doesn't grow, bits just fall off > the end. You can decide whether to sign-extend or 0-extend, either > one's OK. Have we[1] finished work

Re: Bit ops on strings

2004-04-29 Thread Dan Sugalski
At 11:49 AM -0400 4/29/04, Aaron Sherman wrote: bit.ops defines some ops on strings, and not others. I was wondering if anyone thinks the following would be useful (I'm offering to write them, as it won't be much work): lsls(inout STR, in INT) lsrs(inout STR, in INT) and, of course,