Re: encoding problems (é and è)

2006-03-25 Thread Martin v. Löwis
Serge Orlov wrote: > The problem is that U+0587 is a ligature in Western Armenian dialect > (hy locale) and a character in Eastern Armenian dialect (hy_AM locale). > It is strange the code point is marked as compatibility char. It either > mistake or political decision. It used to be a ligature bef

Re: encoding problems (é and è)

2006-03-24 Thread Serge Orlov
Jean-Paul Calderone wrote: > On Fri, 24 Mar 2006 09:33:19 +1100, John Machin <[EMAIL PROTECTED]> wrote: > >On 24/03/2006 8:36 AM, Peter Otten wrote: > >> John Machin wrote: > >> > >>>You can replace ALL of this upshifting and accent removal in one blow by > >>>using the string translate() method wi

Re: encoding problems (é and è)

2006-03-24 Thread Serge Orlov
Martin v. Löwis wrote: > John Machin wrote: > >> and, for things like u'\u0565\u0582' (ARMENIAN SMALL LIGATURE ECH > >> YIWN), it does not even work. > > > > Sorry, I don't understand. > > 0565 is stand-alone ECH > > 0582 is stand-alone YIWN > > 0587 is the ligature. > > What doesn't work? At first

Re: encoding problems (é and è)

2006-03-24 Thread Martin v. Löwis
John Machin wrote: >> and, for things like u'\u0565\u0582' (ARMENIAN SMALL LIGATURE ECH >> YIWN), it does not even work. > > Sorry, I don't understand. > 0565 is stand-alone ECH > 0582 is stand-alone YIWN > 0587 is the ligature. > What doesn't work? At first guess, in the absence of an Armenian

Re: encoding problems (é and è)

2006-03-24 Thread John Machin
On 24/03/2006 11:44 PM, Peter Otten wrote: > John Machin wrote: > > >>0x00d0: ord('D'), # Ð >>0x00f0: ord('o'), # ð >>Icelandic capital eth becomes D, OK; but the small letter becomes o!!! > > > I see information flow from Iceland is a bit better than from Armenia :-) No information flow neede

Re: encoding problems (é and è)

2006-03-24 Thread Peter Otten
John Machin wrote: > 0x00d0: ord('D'), # Ð > 0x00f0: ord('o'), # ð > Icelandic capital eth becomes D, OK; but the small letter becomes o!!! I see information flow from Iceland is a bit better than from Armenia :-) > Some of the transformations are a little unfortunate :-( The OP, as you pointed

Re: encoding problems (é and è)

2006-03-24 Thread John Machin
On 24/03/2006 8:11 PM, Duncan Booth wrote: > Peter Otten wrote: > > >>>You can replace ALL of this upshifting and accent removal in one blow >>>by using the string translate() method with a suitable table. >> >>Only if you convert to unicode first or if your data maintains 1 byte >>== 1 character

Re: encoding problems (é and è)

2006-03-24 Thread Peter Otten
Duncan Booth wrote: > There's a nice little codec from Skip Montaro for removing accents from > latin-1 encoded strings. It also has an error handler so you can convert > from unicode to ascii and strip all the accents as you do so: > > http://orca.mojam.com/~skip/python/latscii.py > import

Re: encoding problems (é and è)

2006-03-23 Thread John Machin
On 24/03/2006 2:19 PM, Jean-Paul Calderone wrote: > On Fri, 24 Mar 2006 09:33:19 +1100, John Machin <[EMAIL PROTECTED]> > wrote: > >> On 24/03/2006 8:36 AM, Peter Otten wrote: >> >>> John Machin wrote: >>> You can replace ALL of this upshifting and accent removal in one blow by us

Re: encoding problems (é and è)

2006-03-23 Thread Jean-Paul Calderone
On Fri, 24 Mar 2006 09:33:19 +1100, John Machin <[EMAIL PROTECTED]> wrote: >On 24/03/2006 8:36 AM, Peter Otten wrote: >> John Machin wrote: >> >>>You can replace ALL of this upshifting and accent removal in one blow by >>>using the string translate() method with a suitable table. >> >> Only if you

Re: encoding problems (é and è)

2006-03-23 Thread John Machin
On 24/03/2006 8:36 AM, Peter Otten wrote: > John Machin wrote: > >>You can replace ALL of this upshifting and accent removal in one blow by >>using the string translate() method with a suitable table. > > Only if you convert to unicode first or if your data maintains 1 byte == 1 > character, in p

Re: encoding problems (é and è)

2006-03-23 Thread Peter Otten
John Machin wrote: > You can replace ALL of this upshifting and accent removal in one blow by > using the string translate() method with a suitable table. Only if you convert to unicode first or if your data maintains 1 byte == 1 character, in particular it is not UTF-8. Peter -- http://mail.

Re: encoding problems (é and è)

2006-03-23 Thread John Machin
On 23/03/2006 10:07 PM, bussiere bussiere wrote: > hi i'am making a program for formatting string, > or > i've added : > #!/usr/bin/python > # -*- coding: utf-8 -*- > > in the begining of my script but > > str = str.replace('Ç', 'C') > str = str.replace('é', 'E') > str = str.repl

Re: encoding problems (é and è)

2006-03-23 Thread Larry Bates
Seems to work fine for me. >>> x="éÇ" >>> x=x.replace('é','E') 'E\xc7' >>> x=x.replace('Ç','C') >>> x 'E\xc7' >>> x=x.replace('Ç','C') >>> x 'EC' You should also be able to use .upper() method to uppercase everything in the string in a single statement: tstr=ligneA.upper() Note: you should neve

Re: encoding problems (é and è)

2006-03-23 Thread Christoph Zwerschke
bussiere bussiere wrote: > hi i'am making a program for formatting string, > i've added : > #!/usr/bin/python > # -*- coding: utf-8 -*- > > in the begining of my script but > > str = str.replace('Ç', 'C') > ... > doesn't work it put me " and , instead of remplacing é by E Are your sure your scr