subject:"encoding vs charset"

Re: encoding vs charset

2008-07-17 Thread NotFound

On Thu, Jul 17, 2008 at 5:20 AM, Allison Randal <[EMAIL PROTECTED]> wrote: > The thing is, there's a tendency for data for a particular program or > application to all be from the same character set (if, for example, you're > parsing a series of files, munging the data in some way, and writing out

Re: encoding vs charset

2008-07-16 Thread Allison Randal

Moritz Lenz wrote: NotFound wrote: To open another can of worms, I think that we can live without character set specification. We can stablish that the character set is always unicode, and to deal only with encodings. We had that discussion already, and the answer was "no" for several reasons

Re: encoding vs charset

2008-07-16 Thread NotFound

On Wed, Jul 16, 2008 at 1:13 AM, Moritz Lenz <[EMAIL PROTECTED]> wrote: > NotFound wrote: >>> * Unicode isn't necessarily universal, or might stop to be so in future. >>> If a character is not representable in Unicode, and you chose to use >>> Unicode for everything, you're screwed >> There are pro

Re: encoding vs charset

2008-07-15 Thread Moritz Lenz

NotFound wrote: >> * Unicode isn't necessarily universal, or might stop to be so in future. >> If a character is not representable in Unicode, and you chose to use >> Unicode for everything, you're screwed > > There are provision for private usage codepoints. If we use them in parrot, we can't us

Re: encoding vs charset

2008-07-15 Thread NotFound

> * Unicode isn't necessarily universal, or might stop to be so in future. > If a character is not representable in Unicode, and you chose to use > Unicode for everything, you're screwed There are provision for private usage codepoints. > * related to the previous point, some other character enco

Re: encoding vs charset

2008-07-15 Thread Moritz Lenz

NotFound wrote: > To open another can of worms, I think that we can live without > character set specification. We can stablish that the character set is > always unicode, and to deal only with encodings. We had that discussion already, and the answer was "no" for several reasons: * Strings might

Re: encoding vs charset

2008-07-15 Thread NotFound

To open another can of worms, I think that we can live without character set specification. We can stablish that the character set is always unicode, and to deal only with encodings. Ascii is an encoding that maps directly to codepoints and only allows 0-127 values. iso-8859-1 is the same with 0-25

Re: encoding vs charset

2008-07-15 Thread Mark J. Reed

Uhm, by the fact that they didn't type "\ab65" ? On 7/15/08, Leopold Toetsch <[EMAIL PROTECTED]> wrote: > Am Dienstag, 15. Juli 2008 23:35 schrieb Patrick R. Michaud: >> On Tue, Jul 15, 2008 at 11:17:23PM +0200, Leopold Toetsch wrote: >> > 21:51 < pmichaud> so unicode:"«" and unicode:"\xab"

Re: encoding vs charset

2008-07-15 Thread NotFound

On Tue, Jul 15, 2008 at 11:45 PM, Mark J. Reed <[EMAIL PROTECTED]> wrote: > IMESHO, the encoding of the source code should have no bearing on the > interpretation of string literal escape sequences within that source > code. "\ab" should mean U+00AB no matter whether the surrounding > source code

Re: encoding vs charset

2008-07-15 Thread Leopold Toetsch

Am Dienstag, 15. Juli 2008 23:35 schrieb Patrick R. Michaud: > On Tue, Jul 15, 2008 at 11:17:23PM +0200, Leopold Toetsch wrote: > > 21:51 < pmichaud> so unicode:"«" and unicode:"\xab" would produce > > exactly the same result. > > 21:51 < pmichaud> even down to being the same .pbc output. > >

Re: encoding vs charset

2008-07-15 Thread Mark J. Reed

> unicode:"\ab" is illegal No way. "Unicode" "\ab" should represent U+00AB. I don't care what the byte-level representation is. In UTF-8, that's 0xc2 0xab; in UTF-16BE it's 0x00 00ab; in UTF-32LE it's 0xab 0x00 0x00 0x00. > I think that there is still some confusion between the encoding of sou

Re: encoding vs charset

2008-07-15 Thread Patrick R. Michaud

On Tue, Jul 15, 2008 at 11:17:23PM +0200, Leopold Toetsch wrote: > 21:51 < pmichaud> so unicode:"«" and unicode:"\xab" would produce > exactly > the same result. > 21:51 < pmichaud> even down to being the same .pbc output. > 21:51 < allison> pmichaud: exactly > > The former is a valid char

encoding vs charset

2008-07-15 Thread Leopold Toetsch

Hi, I just saw that and such (too late) at #parrotsketch: 21:52 < NotFound> So unicode:"\xab" and utf8:unicode:"\xab" is also the same result? In my opinion and (AFAIK still in the implementation) is it that the encoding bit of PIR is how the possibly escaped bytes are specifying the codepoi

Re: encoding vs charset

Re: encoding vs charset

Re: encoding vs charset

Re: encoding vs charset

Re: encoding vs charset

Re: encoding vs charset

Re: encoding vs charset

Re: encoding vs charset

Re: encoding vs charset

Re: encoding vs charset

Re: encoding vs charset

Re: encoding vs charset

encoding vs charset

13 matches

Site Navigation

Mail list logo

Footer information