On May 26, 2004, at 2:02 AM, Nicholas Clark wrote:
On Tue, May 25, 2004 at 07:48:32PM -0700, Jeff Clites wrote:
On May 25, 2004, at 12:26 PM, Dan Sugalski wrote:
Yup. UTF8 is Just another variable-width encoding. Do anything with
it
and we convert it to a fixed-width encoding, in this case UTF32.
On Tue, May 25, 2004 at 07:48:32PM -0700, Jeff Clites wrote:
> On May 25, 2004, at 12:26 PM, Dan Sugalski wrote:
> >Yup. UTF8 is Just another variable-width encoding. Do anything with it
> >and we convert it to a fixed-width encoding, in this case UTF32.
>
> This has the unfortunate side-effect
On May 25, 2004, at 12:26 PM, Dan Sugalski wrote:
At 12:30 PM +0100 5/25/04, Nicholas Clark wrote:
I may be misremembering what I've read here but I thought that Dan
said
that for variable length encodings (such as shift-JIS) parrot would
store
the byte(s) in memory in constant size 16 or 32 bit
At 8:30 PM +0100 5/25/04, Nicholas Clark wrote:
On Tue, May 25, 2004 at 03:26:45PM -0400, Dan Sugalski wrote:
Yup. UTF8 is Just another variable-width encoding. Do anything with
it and we convert it to a fixed-width encoding, in this case UTF32.
Does this mean that we won't be verifying the valid
On Tue, May 25, 2004 at 03:26:45PM -0400, Dan Sugalski wrote:
> Yup. UTF8 is Just another variable-width encoding. Do anything with
> it and we convert it to a fixed-width encoding, in this case UTF32.
Does this mean that we won't be verifying the validity of UTF8 on input?
And instead pitching
At 12:30 PM +0100 5/25/04, Nicholas Clark wrote:
I may be misremembering what I've read here but I thought that Dan said
that for variable length encodings (such as shift-JIS) parrot would store
the byte(s) in memory in constant size 16 or 32 bit integers, rather than
the (external) variable length
On Sun, May 02, 2004 at 11:37:31AM -0700, Jeff Clites wrote:
> Two more things to keep in mind:
>
> On May 1, 2004, at 4:54 PM, Aaron Sherman wrote:
>
> >If Perl defaults to UTF-8
>
> People need to realize also that although UTF-8 is a pretty good
> interchange format, it's a really bad in-mem
Two more things to keep in mind:
On May 1, 2004, at 4:54 PM, Aaron Sherman wrote:
If Perl defaults to UTF-8
People need to realize also that although UTF-8 is a pretty good
interchange format, it's a really bad in-memory representation. This is
for at least 2 related reasons: (1) To get to the N-
>>
>>I am very confused. THIS IS WHAT WE ALL SEEM TO BE SAYING. BITOPS ONLY
>>ON EIGHT-BIT DATA. AM I WRONG?
>
>
> No, it's not, and could you please not get emotional about this? It's
I apologize for using UPPERCASE. My only excuse is that it was not
personally aimed at you: I have been gri
On Sat, 2004-05-01 at 15:09, Jarkko Hietaniemi wrote:
> > How are you defining "valid UTF-8"? Is there a codepoint in UTF-8
> > between \x00 and \xff that isn't valid? Is there a reason to ever do
>
> Like, half of them? \x80 .. \xff are all invalid as UTF-8.
Heh, damn Ken Thompson and his place
It's been said that what the "masses" think of as binary data is outside
the concept of a string, and this lurker just don't see that. A binary
string is string over a character set of size two, just like an ASCII
string is a string over a character set of size 128. [Like character
strings, so-ca
On May 1, 2004, at 12:00 PM, Aaron Sherman wrote:
On Sat, 2004-05-01 at 14:18, Jeff Clites wrote:
Exactly. And also realize that if you bitwise-not (or shift or
something similar) the bytes of a UTF-8 serialization of something,
the
result isn't going to be valid UTF-8, so you'd be hard-pressed to
> How are you defining "valid UTF-8"? Is there a codepoint in UTF-8
> between \x00 and \xff that isn't valid? Is there a reason to ever do
Like, half of them? \x80 .. \xff are all invalid as UTF-8.
> bitwise operations on anything other than 8-bit codepoints?
I am very confused. THIS IS WHAT W
On Sat, 2004-05-01 at 14:18, Jeff Clites wrote:
> On May 1, 2004, at 8:26 AM, Jarkko Hietaniemi wrote:
> Just FYI, the way I implemented bitwise-not so far, was to bitwise-not
> code points 0x{00}-0x{FF} as uint8-sized things, 0x{100}-0x{} as
> uint16-sized things, and > 0x{} as uint32-s
On May 1, 2004, at 8:26 AM, Jarkko Hietaniemi wrote:
So it seems to me that the "obvious" way to go is to have all bit-s
operations first convert to raw bytes (possibly throwing an exception)
and then proceed to do their work.
If these conversions croak if there are code points beyond \x{ff}, I'm
f
On Sat, 2004-05-01 at 11:26, Jarkko Hietaniemi wrote:
As for codepoints outside of \x00-\xff, I vote exception. I don't think
there's any other logical choice, but I think it's just an encoding
conversion exception, not a special bit-op exception (that's arm-waving,
I have not looked at Parrot's e
>
> So it seems to me that the "obvious" way to go is to have all bit-s
> operations first convert to raw bytes (possibly throwing an exception)
> and then proceed to do their work.
If these conversions croak if there are code points beyond \x{ff}, I'm
fine with it. But trying to mix \x{100} or
On Sat, 2004-05-01 at 04:57, Jarkko Hietaniemi wrote:
> > If Jarkko
> > tells me you can do bitwise operations with unicode text now in Perl
> > 5, well... we'll support it there, too, though we shan't like it at
> > all.
>
> We can and I don't like it at all [...]
> None of it anything I want
On Fri, 2004-04-30 at 15:34, Dan Sugalski wrote:
> If you want, you could think of the S-register strings as mini-PMCs.
> The encoding and charset stuff (we'll ignore language semantics for
> the moment) are essentially small vtables that hang off the string,
> and whatever we do with it mostly
On Fri, 2004-04-30 at 13:53, Dan Sugalski wrote:
> Parrot, at the very low levels, makes no distinction between strings
> and buffers--as far as it's concerned they're the same thing, and
> either can hang off an S register. (Ultimately, when *I* talk of
> strings I mean "A thing I can hang off
>
> The bitshift operations on S-register contents are valid, so long as
> the thing hanging off the register support it. Binary data ought
> allow this. Most 8-bit string encodings will have to support it
> whether it's a good idea or not, since you can do it now. If Jarkko
> tells me you can
At 7:07 PM -0700 4/30/04, Jeff Clites wrote:
On Apr 30, 2004, at 10:22 AM, Dan Sugalski wrote:
At 2:57 AM +1000 5/1/04, Andre Pang wrote:
Of course Parrot should have a function to reinterpret something
of a string type as raw binary data and vice versa, but don't mix
binary data with strings: th
On Apr 30, 2004, at 10:22 AM, Dan Sugalski wrote:
At 2:57 AM +1000 5/1/04, Andre Pang wrote:
Of course Parrot should have a function to reinterpret something of a
string type as raw binary data and vice versa, but don't mix binary
data with strings: they are completely different types, and raw
b
Dan Sugalski <[EMAIL PROTECTED]> wrote:
> If you want, you could think of the S-register strings as mini-PMCs.
> The encoding and charset stuff (we'll ignore language semantics for
> the moment) are essentially small vtables that hang off the string,
I think its the cleanest way of implementing a
At 4:15 PM -0400 4/30/04, Bryan C. Warnock wrote:
On Fri, 2004-04-30 at 15:34, Dan Sugalski wrote:
If you want, you could think of the S-register strings as mini-PMCs.
The encoding and charset stuff (we'll ignore language semantics for
the moment) are essentially small vtables that hang off the
At 2:58 PM -0400 4/30/04, Bryan C. Warnock wrote:
On Fri, 2004-04-30 at 13:53, Dan Sugalski wrote:
Parrot, at the very low levels, makes no distinction between strings
and buffers--as far as it's concerned they're the same thing, and
either can hang off an S register. (Ultimately, when *I* talk
At 12:18 PM -0400 4/30/04, Butler, Gerald wrote:
A string is what Dan described in his various postings on strings. Nuff
said.
Gerald Butler responds:
Yes, I know a "String" is what Dan described. He described a thingy
made up of 32-bit Values where each value represented a "Code-Point". Now,
-Original Message-
From: Aaron Sherman [mailto:[EMAIL PROTECTED]
Sent: Friday, April 30, 2004 11:58 AM
To: Butler, Gerald
Cc: Perl6 Internals List
Subject: RE: Bit ops on strings
On Fri, 2004-04-30 at 09:47, Butler, Gerald wrote:
> If I may interject for a moment:
Let me start
At 2:57 AM +1000 5/1/04, Andre Pang wrote:
Of course Parrot should have a function to reinterpret something of
a string type as raw binary data and vice versa, but don't mix
binary data with strings: they are completely different types, and
raw binary data should never be able to be put into a s
On 30/04/2004, at 11:47 PM, Butler, Gerald wrote:
1. String - low-level, abstract, base class (or in Perl6 terms role --
I think) which represents a "logically" contiguous series of Parrot Int
2. BinaryString - inherits from String, represents a "logically"
contiguous series of "byt
On Fri, 2004-04-30 at 12:18, Butler, Gerald wrote:
> Now, we
> have people talking about doing "LSL/LSR" on "Strings". That is 100%
> inconsistent with that definition of a "String".
Not at all, and keep in mind that I didn't propose this out of the blue.
"bands", "bxors" and "bors" are existing
On Fri, 2004-04-30 at 09:47, Butler, Gerald wrote:
> If I may interject for a moment:
Let me start by saying that I have not drunk the Unicode cool-aid. I'm
not at all certain that the overhead required to do all of what Parrot
wants to do is warranted, BUT that's beside the point.
Parrot is doin
On Fri, 2004-04-30 at 10:42, Dan Sugalski wrote:
> Bitstring operations ought only be valid on binary data, though,
> unless someone can give me a good reason why we ought to allow
> bitshifting on Unicode. (And then give me a reasoned argument *how*,
> too)
100% agree. If you want to play gam
If I may interject for a moment:
-Original Message-
From: Bryan C. Warnock [mailto:[EMAIL PROTECTED]
Sent: Friday, April 30, 2004 9:08 AM
To: Dan Sugalski
Cc: Perl6 Internals List
Subject: Re: Bit ops on strings
On Thu, 2004-04-29 at 13:04, Dan Sugalski wrote:
> I think left and ri
At 9:07 AM -0400 4/30/04, Bryan C. Warnock wrote:
On Thu, 2004-04-29 at 13:04, Dan Sugalski wrote:
I think left and right shift of strings should work the same way that
shifts on ints works--that is, it doesn't grow, bits just fall off
the end. You can decide whether to sign-extend or 0-extend,
On Thu, 2004-04-29 at 13:04, Dan Sugalski wrote:
> I think left and right shift of strings should work the same way that
> shifts on ints works--that is, it doesn't grow, bits just fall off
> the end. You can decide whether to sign-extend or 0-extend, either
> one's OK.
Have we[1] finished work
At 11:49 AM -0400 4/29/04, Aaron Sherman wrote:
bit.ops defines some ops on strings, and not others. I was wondering if
anyone thinks the following would be useful (I'm offering to write them,
as it won't be much work):
lsls(inout STR, in INT)
lsrs(inout STR, in INT)
and, of course,
37 matches
Mail list logo