Re: Unicode/ascii encoding nightmare

2006-11-07 Thread Cliff Wells
On Mon, 2006-11-06 at 15:47 -0800, John Machin wrote: > Gabriel Genellina wrote: > > At Monday 6/11/2006 20:34, Robert Kern wrote: > > > > >John Machin wrote: > > > > Indeed yourself. Have you ever considered reading posts in > > > > chronological order, or reading all posts in a thread? > > > > >

Re: Unicode/ascii encoding nightmare

2006-11-07 Thread Cliff Wells
On Tue, 2006-11-07 at 08:10 +0200, Hendrik van Rooyen wrote: > "John Machin" <[EMAIL PROTECTED]> wrote: > > 8<--- > > > I strongly suggest that you read the docs *FIRST*, and don't "tinker" > > at all. > > > This is *good* advice - its unlikely to be followed

Re: Unicode/ascii encoding nightmare

2006-11-07 Thread Paul Boddie
Thomas W wrote: > Ok, I've cleaned up my code abit and it seems as if I've > encoded/decoded myself into a corner ;-). Yes, you may encounter situations where you have some string, you "decode" it (ie. convert it to Unicode) using one character encoding, but then you later "encode" it (ie. convert

Re: Unicode/ascii encoding nightmare

2006-11-07 Thread Andrea Griffini
John Machin wrote: > Indeed yourself. What does the above mean ? > Have you ever considered reading posts in > chronological order, or reading all posts in a thread? I do no think people read posts in chronological order; it simply doesn't make sense. I also don't think many do read threads com

Re: Unicode/ascii encoding nightmare

2006-11-06 Thread Hendrik van Rooyen
"John Machin" <[EMAIL PROTECTED]> wrote: 8<--- > I strongly suggest that you read the docs *FIRST*, and don't "tinker" > at all. > > HTH, > John This is *good* advice - its unlikely to be followed though, as the OP is prolly just like most of us - you unpack

Re: Unicode/ascii encoding nightmare

2006-11-06 Thread John Machin
Cameron Laird wrote: > In article <[EMAIL PROTECTED]>, > John Machin <[EMAIL PROTECTED]> wrote: > > > >Thomas W wrote: > >> Ok, I've cleaned up my code abit and it seems as if I've > >> encoded/decoded myself into a corner ;-). My understanding of unicode > >> has room for improvement, that's for

Re: Unicode/ascii encoding nightmare

2006-11-06 Thread Cameron Laird
In article <[EMAIL PROTECTED]>, John Machin <[EMAIL PROTECTED]> wrote: > >Thomas W wrote: >> Ok, I've cleaned up my code abit and it seems as if I've >> encoded/decoded myself into a corner ;-). My understanding of unicode >> has room for improvement, that's for sure. I got some pointers and >> ini

Re: Unicode/ascii encoding nightmare

2006-11-06 Thread John Machin
Gabriel Genellina wrote: > At Monday 6/11/2006 20:34, Robert Kern wrote: > > >John Machin wrote: > > > Indeed yourself. Have you ever considered reading posts in > > > chronological order, or reading all posts in a thread? > > > >That presumes that messages arrive in chronological order and > >tra

Re: Unicode/ascii encoding nightmare

2006-11-06 Thread Gabriel Genellina
At Monday 6/11/2006 20:34, Robert Kern wrote: John Machin wrote: > Indeed yourself. Have you ever considered reading posts in > chronological order, or reading all posts in a thread? That presumes that messages arrive in chronological order and transmissions are instantaneous. Neither are tru

Re: Unicode/ascii encoding nightmare

2006-11-06 Thread Robert Kern
John Machin wrote: > Indeed yourself. Have you ever considered reading posts in > chronological order, or reading all posts in a thread? That presumes that messages arrive in chronological order and transmissions are instantaneous. Neither are true. -- Robert Kern "I have come to believe that t

Re: Unicode/ascii encoding nightmare

2006-11-06 Thread John Machin
Andrea Griffini wrote: > John Machin wrote: > > > The fact that C3 and C2 are both present, plus the fact that one > > non-ASCII byte has morphoploded into 4 bytes indicate a double whammy. > > Indeed... > > >>> x = u"fødselsdag" > >>> x.encode('utf-8').decode('iso-8859-1').encode('utf-8') > 'f\

Re: Unicode/ascii encoding nightmare

2006-11-06 Thread John Machin
Thomas W wrote: > Ok, I've cleaned up my code abit and it seems as if I've > encoded/decoded myself into a corner ;-). My understanding of unicode > has room for improvement, that's for sure. I got some pointers and > initial code-cleanup seem to have removed some of the strange results I > got, w

Re: Unicode/ascii encoding nightmare

2006-11-06 Thread Thomas W
Ok, I've cleaned up my code abit and it seems as if I've encoded/decoded myself into a corner ;-). My understanding of unicode has room for improvement, that's for sure. I got some pointers and initial code-cleanup seem to have removed some of the strange results I got, which several of you also po

Re: Unicode/ascii encoding nightmare

2006-11-06 Thread Georg Brandl
Thomas W wrote: > I'm getting really annoyed with python in regards to > unicode/ascii-encoding problems. > > The string below is the encoding of the norwegian word "fødselsdag". > s = 'f\xc3\x83\xc2\xb8dselsdag' Which encoding is this? > I stored the string as "fødselsdag" but somewhere i

Re: Unicode/ascii encoding nightmare

2006-11-06 Thread Andrea Griffini
John Machin wrote: > The fact that C3 and C2 are both present, plus the fact that one > non-ASCII byte has morphoploded into 4 bytes indicate a double whammy. Indeed... >>> x = u"fødselsdag" >>> x.encode('utf-8').decode('iso-8859-1').encode('utf-8') 'f\xc3\x83\xc2\xb8dselsdag' Andrea -- http

Re: Unicode/ascii encoding nightmare

2006-11-06 Thread John Machin
Robert Kern wrote: > However, I don't know of an encoding that takes u"fødselsdag" to > 'f\xc3\x83\xc2\xb8dselsdag'. There isn't one. C3 and C2 hint at UTF-8. The fact that C3 and C2 are both present, plus the fact that one non-ASCII byte has morphoploded into 4 bytes indicate a double whammy.

Re: Unicode/ascii encoding nightmare

2006-11-06 Thread John Machin
Thomas W wrote: > I'm getting really annoyed with python in regards to > unicode/ascii-encoding problems. > > The string below is the encoding of the norwegian word "fødselsdag". > > >>> s = 'f\xc3\x83\xc2\xb8dselsdag' There is no such thing as "*the* encoding" of any given string. > > I stored t

Re: Unicode/ascii encoding nightmare

2006-11-06 Thread Robert Kern
Thomas W wrote: > I'm getting really annoyed with python in regards to > unicode/ascii-encoding problems. > > The string below is the encoding of the norwegian word "fødselsdag". > s = 'f\xc3\x83\xc2\xb8dselsdag' > > I stored the string as "fødselsdag" but somewhere in my code it got > tran

Re: Unicode/ascii encoding nightmare

2006-11-06 Thread Mark Peters
> The string below is the encoding of the norwegian word "fødselsdag". > > >>> s = 'f\xc3\x83\xc2\xb8dselsdag' I'm not sure which encoding method you used to get the string above. Here's the result of my playing with the string in IDLE: >>> u1 = u'fødselsdag' >>> u1 u'f\xf8dselsdag' >>> s1 = u1.e

Unicode/ascii encoding nightmare

2006-11-06 Thread Thomas W
I'm getting really annoyed with python in regards to unicode/ascii-encoding problems. The string below is the encoding of the norwegian word "fødselsdag". >>> s = 'f\xc3\x83\xc2\xb8dselsdag' I stored the string as "fødselsdag" but somewhere in my code it got translated into the mess above and I