Re: Unicode error handler

2007-01-31 Thread Walter Dörwald
[EMAIL PROTECTED] wrote: > On Jan 30, 11:28 pm, Walter Dörwald <[EMAIL PROTECTED]> wrote: > >> codecs.register_error("transliterate", transliterate) >> >>Walter > > Really, really slick solution. > Though, why was it [:1], not [0]? ;-) No particular reason, unicodedata.normalize("NFD", ...)

Re: Unicode error handler

2007-01-31 Thread Walter Dörwald
Martin v. Löwis wrote: > Walter Dörwald schrieb: >> You might try the following: >> >> # -*- coding: iso-8859-1 -*- >> >> import unicodedata, codecs >> >> def transliterate(exc): >> if not isinstance(exc, UnicodeEncodeError): >> raise TypeError("don'ty know how to handle %r" % r)

Re: Unicode error handler

2007-01-31 Thread Gabriel Genellina
En Wed, 31 Jan 2007 01:21:49 -0300, [EMAIL PROTECTED] <[EMAIL PROTECTED]> escribió: > I don't understand what %r and r are and where they are from. The man > 3 printf page doesn't have %r formatting. Perhaps you should look into the Python docs instead? -- Gabriel Genellina -- http://mail.p

Re: Unicode error handler

2007-01-30 Thread Martin v. Löwis
Walter Dörwald schrieb: > You might try the following: > > # -*- coding: iso-8859-1 -*- > > import unicodedata, codecs > > def transliterate(exc): > if not isinstance(exc, UnicodeEncodeError): > raise TypeError("don'ty know how to handle %r" % r) > return (unicodedata.n

Re: Unicode error handler

2007-01-30 Thread [EMAIL PROTECTED]
On Jan 30, 11:28 pm, Walter Dörwald <[EMAIL PROTECTED]> wrote: > > codecs.register_error("transliterate", transliterate) > >Walter Really, really slick solution. Though, why was it [:1], not [0]? ;-) And one more thing: > def transliterate(exc): > if not isinstance(exc, UnicodeEncode

Re: Unicode error handler

2007-01-30 Thread Walter Dörwald
Rares Vernica wrote: > Hi, > > Does anyone know of any Unicode encode/decode error handler that does a > better replace job than the default replace error handler? > > For example I have an iso-8859-1 string that has an 'e' with an accent > (you know, the French 'e's). When I use s.encode('asci

Re: Unicode error handler

2007-01-26 Thread Rares Vernica
It does the job. Thanks a lot, Ray Peter Otten wrote: > Rares Vernica wrote: > >> Is there an encode/decode error handler that can replace all the >> not-ascii letters from iso-8859-1 with their closest ascii letter? > > A mapping, not an error handler, but it might do the job: > > http://effb

Re: Unicode error handler

2007-01-26 Thread Robert Kern
Rares Vernica wrote: > Is there an encode/decode error handler that can replace all the > not-ascii letters from iso-8859-1 with their closest ascii letter? No, but IBM's ICU library can transform one script to another in very flexible and capable ways. One such configuration can do what you ask.

Re: Unicode error handler

2007-01-26 Thread Peter Otten
Rares Vernica wrote: > Is there an encode/decode error handler that can replace all the > not-ascii letters from iso-8859-1 with their closest ascii letter? A mapping, not an error handler, but it might do the job: http://effbot.org/zone/unicode-convert.htm Peter -- http://mail.python.org/mail

Unicode error handler

2007-01-26 Thread Rares Vernica
Hi, Does anyone know of any Unicode encode/decode error handler that does a better replace job than the default replace error handler? For example I have an iso-8859-1 string that has an 'e' with an accent (you know, the French 'e's). When I use s.encode('ascii', 'replace') the 'e' will be rep