Re: encoding problems

2007-08-29 Thread Damjan
> > is there a way to sort this string properly (sorted()?) > I mean first 'a' then 'à' then 'e' etc. (sorted puts accented letters at > the end). Or should I have to provide a comparison function to sorted? After setting the locale... locale.strcoll() -- damjan -- http://mail.python.org/mai

Re: encoding problems

2007-08-29 Thread Diez B. Roggisch
Ricardo Aráoz wrote: > Lawrence D'Oliveiro wrote: >> In message <[EMAIL PROTECTED]>, tool69 wrote: >> >>> p2.content = """Ce poste possède des accents : é à ê è""" >> >> My guess is this is being encoded as a Latin-1 string, but when you try >> to output it it goes through the ASCII encoder, whi

Re: encoding problems

2007-08-29 Thread Ricardo Aráoz
Lawrence D'Oliveiro wrote: > In message <[EMAIL PROTECTED]>, tool69 wrote: > >> p2.content = """Ce poste possède des accents : é à ê è""" > > My guess is this is being encoded as a Latin-1 string, but when you try to > output it it goes through the ASCII encoder, which doesn't understand the > ac

Re: encoding problems

2007-08-29 Thread tool69
Diez B. Roggisch a écrit : > tool69 wrote: > >> Hi, >> >> I would like to transform reST contents to HTML, but got problems >> with accented chars. >> >> Here's a rather simplified version using SVN Docutils 0.5: >> >> %- >> >> #!/usr/bin

Re: encoding problems

2007-08-29 Thread tool69
Lawrence D'Oliveiro a écrit : > In message <[EMAIL PROTECTED]>, tool69 wrote: > >> p2.content = """Ce poste possède des accents : é à ê è""" > > My guess is this is being encoded as a Latin-1 string, but when you try to > output it it goes through the ASCII encoder, which doesn't understand the >

Re: encoding problems

2007-08-29 Thread Diez B. Roggisch
tool69 wrote: > Hi, > > I would like to transform reST contents to HTML, but got problems > with accented chars. > > Here's a rather simplified version using SVN Docutils 0.5: > > %- > > #!/usr/bin/env python > # -*- coding: utf-8 -*-

Re: encoding problems

2007-08-29 Thread Lawrence D'Oliveiro
In message <[EMAIL PROTECTED]>, tool69 wrote: > p2.content = """Ce poste possède des accents : é à ê è""" My guess is this is being encoded as a Latin-1 string, but when you try to output it it goes through the ASCII encoder, which doesn't understand the accents. Try this: p2.content = u"""Ce po

Re: encoding problems (é and è)

2006-03-25 Thread Martin v. Löwis
Serge Orlov wrote: > The problem is that U+0587 is a ligature in Western Armenian dialect > (hy locale) and a character in Eastern Armenian dialect (hy_AM locale). > It is strange the code point is marked as compatibility char. It either > mistake or political decision. It used to be a ligature bef

Re: encoding problems (é and è)

2006-03-24 Thread Serge Orlov
Jean-Paul Calderone wrote: > On Fri, 24 Mar 2006 09:33:19 +1100, John Machin <[EMAIL PROTECTED]> wrote: > >On 24/03/2006 8:36 AM, Peter Otten wrote: > >> John Machin wrote: > >> > >>>You can replace ALL of this upshifting and accent removal in one blow by > >>>using the string translate() method wi

Re: encoding problems (é and è)

2006-03-24 Thread Serge Orlov
Martin v. Löwis wrote: > John Machin wrote: > >> and, for things like u'\u0565\u0582' (ARMENIAN SMALL LIGATURE ECH > >> YIWN), it does not even work. > > > > Sorry, I don't understand. > > 0565 is stand-alone ECH > > 0582 is stand-alone YIWN > > 0587 is the ligature. > > What doesn't work? At first

Re: encoding problems (é and è)

2006-03-24 Thread Martin v. Löwis
John Machin wrote: >> and, for things like u'\u0565\u0582' (ARMENIAN SMALL LIGATURE ECH >> YIWN), it does not even work. > > Sorry, I don't understand. > 0565 is stand-alone ECH > 0582 is stand-alone YIWN > 0587 is the ligature. > What doesn't work? At first guess, in the absence of an Armenian

Re: encoding problems (� and

2006-03-24 Thread Fredrik Lundh
John Machin wrote: > Some of the transformations are a little unfortunate :-( here's a slightly silly way to map a unicode string to its "unaccented" version: ### import unicodedata, sys CHAR_REPLACEMENT = { 0xc6: u"AE", # LATIN CAPITAL LETTER AE 0xd0: u"D", # LATIN CAPITAL LETTER ETH

Re: encoding problems (é and è)

2006-03-24 Thread John Machin
On 24/03/2006 11:44 PM, Peter Otten wrote: > John Machin wrote: > > >>0x00d0: ord('D'), # Ð >>0x00f0: ord('o'), # ð >>Icelandic capital eth becomes D, OK; but the small letter becomes o!!! > > > I see information flow from Iceland is a bit better than from Armenia :-) No information flow neede

Re: encoding problems (X and X)

2006-03-24 Thread Walter Dörwald
Duncan Booth wrote: > [...] > Unfortunately, just as I finished writing this I discovered that the > latscii module isn't as robust as I thought, it blows up on consecutive > accented characters. > > :( Replace the error handler with this (untested) and it should work with consecutive accent

Re: encoding problems (é and è)

2006-03-24 Thread Peter Otten
John Machin wrote: > 0x00d0: ord('D'), # Ð > 0x00f0: ord('o'), # ð > Icelandic capital eth becomes D, OK; but the small letter becomes o!!! I see information flow from Iceland is a bit better than from Armenia :-) > Some of the transformations are a little unfortunate :-( The OP, as you pointed

Re: encoding problems (é and è)

2006-03-24 Thread John Machin
On 24/03/2006 8:11 PM, Duncan Booth wrote: > Peter Otten wrote: > > >>>You can replace ALL of this upshifting and accent removal in one blow >>>by using the string translate() method with a suitable table. >> >>Only if you convert to unicode first or if your data maintains 1 byte >>== 1 character

Re: encoding problems (é and è)

2006-03-24 Thread Peter Otten
Duncan Booth wrote: > There's a nice little codec from Skip Montaro for removing accents from > latin-1 encoded strings. It also has an error handler so you can convert > from unicode to ascii and strip all the accents as you do so: > > http://orca.mojam.com/~skip/python/latscii.py > import

Re: encoding problems (� and

2006-03-24 Thread Duncan Booth
Peter Otten wrote: >> You can replace ALL of this upshifting and accent removal in one blow >> by using the string translate() method with a suitable table. > > Only if you convert to unicode first or if your data maintains 1 byte > == 1 character, in particular it is not UTF-8. > There's a ni

Re: encoding problems (é and è)

2006-03-23 Thread John Machin
On 24/03/2006 2:19 PM, Jean-Paul Calderone wrote: > On Fri, 24 Mar 2006 09:33:19 +1100, John Machin <[EMAIL PROTECTED]> > wrote: > >> On 24/03/2006 8:36 AM, Peter Otten wrote: >> >>> John Machin wrote: >>> You can replace ALL of this upshifting and accent removal in one blow by us

Re: encoding problems (é and è)

2006-03-23 Thread Jean-Paul Calderone
On Fri, 24 Mar 2006 09:33:19 +1100, John Machin <[EMAIL PROTECTED]> wrote: >On 24/03/2006 8:36 AM, Peter Otten wrote: >> John Machin wrote: >> >>>You can replace ALL of this upshifting and accent removal in one blow by >>>using the string translate() method with a suitable table. >> >> Only if you

Re: encoding problems (é and è)

2006-03-23 Thread John Machin
On 24/03/2006 8:36 AM, Peter Otten wrote: > John Machin wrote: > >>You can replace ALL of this upshifting and accent removal in one blow by >>using the string translate() method with a suitable table. > > Only if you convert to unicode first or if your data maintains 1 byte == 1 > character, in p

Re: encoding problems (é and è)

2006-03-23 Thread Peter Otten
John Machin wrote: > You can replace ALL of this upshifting and accent removal in one blow by > using the string translate() method with a suitable table. Only if you convert to unicode first or if your data maintains 1 byte == 1 character, in particular it is not UTF-8. Peter -- http://mail.

Re: encoding problems (é and è)

2006-03-23 Thread John Machin
On 23/03/2006 10:07 PM, bussiere bussiere wrote: > hi i'am making a program for formatting string, > or > i've added : > #!/usr/bin/python > # -*- coding: utf-8 -*- > > in the begining of my script but > > str = str.replace('Ç', 'C') > str = str.replace('é', 'E') > str = str.repl

Re: encoding problems (é and è)

2006-03-23 Thread Larry Bates
Seems to work fine for me. >>> x="éÇ" >>> x=x.replace('é','E') 'E\xc7' >>> x=x.replace('Ç','C') >>> x 'E\xc7' >>> x=x.replace('Ç','C') >>> x 'EC' You should also be able to use .upper() method to uppercase everything in the string in a single statement: tstr=ligneA.upper() Note: you should neve

Re: encoding problems (é and è)

2006-03-23 Thread Christoph Zwerschke
bussiere bussiere wrote: > hi i'am making a program for formatting string, > i've added : > #!/usr/bin/python > # -*- coding: utf-8 -*- > > in the begining of my script but > > str = str.replace('Ç', 'C') > ... > doesn't work it put me " and , instead of remplacing é by E Are your sure your scr

Re: encoding problems with pymssql / win

2006-02-11 Thread morris carre
> (to email use "boris at batiment71 dot ch") oops, that's "boris at batiment71 dot net" -- http://mail.python.org/mailman/listinfo/python-list