subject:"utf \- string translation"

Re: utf - string translation

2006-11-29 Thread John Machin

Fredrik Lundh wrote: > John Machin wrote: > > > Another point: there are many non-latin1 characters that could be > > mapped to ASCII. For example: > > u"\u0141ukasziewicz".translate(unaccented_map()) > > doesn't work unless an entry is added to the no-decomposition table: > > 0x0141: u"L"

Re: utf - string translation

2006-11-29 Thread Fredrik Lundh

John Machin wrote: > Another point: there are many non-latin1 characters that could be > mapped to ASCII. For example: > u"\u0141ukasziewicz".translate(unaccented_map()) > doesn't work unless an entry is added to the no-decomposition table: > 0x0141: u"L", # LATIN CAPITAL LETTER L WITH STR

Re: utf - string translation

2006-11-29 Thread John Machin

Fredrik Lundh wrote: > John Machin wrote: > > > 3. ... and to check for missing maps. The OP may be working only with > > French text, and may not care about Icelandic and German letters, but > > other readers who stumble on this (and miss past thread(s) on this > > topic) may like something done

Re: utf - string translation

2006-11-29 Thread Fredrik Lundh

John Machin wrote: > 3. ... and to check for missing maps. The OP may be working only with > French text, and may not care about Icelandic and German letters, but > other readers who stumble on this (and miss past thread(s) on this > topic) may like something done with \xde (capital thorn), \xfe

Re: utf - string translation

2006-11-29 Thread John Machin

Frederic Rentsch wrote: > Try this: > > from_characters = > '\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd8\xd9\xda\xdb\xdc\xdd\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf8\xf9\xfa\xfb\xfc\xfd\xff\xe7\xe8\xe9\xea\x

Re: utf - string translation

2006-11-29 Thread Frederic Rentsch

Dan wrote: > On 22 nov, 22:59, "John Machin" <[EMAIL PROTECTED]> wrote: > > >>> processes (Vigenère) >>> >> So why do you want to strip off accents? The history of communication >> has several examples of significant difference in meaning caused by >> minute differences in punctuation or

Re: utf - string translation

2006-11-26 Thread Dan

On 22 nov, 22:59, "John Machin" <[EMAIL PROTECTED]> wrote: > > processes (Vigenère) > So why do you want to strip off accents? The history of communication > has several examples of significant difference in meaning caused by > minute differences in punctuation or accents including one of which yo

Re: utf - string translation

2006-11-23 Thread Eric Brunel

On Wed, 22 Nov 2006 22:59:01 +0100, John Machin <[EMAIL PROTECTED]> wrote: [snip] > So why do you want to strip off accents? The history of communication > has several examples of significant difference in meaning caused by > minute differences in punctuation or accents including one of which you

Re: utf - string translation

2006-11-23 Thread Fredrik Lundh

Klaas wrote: > It's not too hard to imagine an accentual difference, eg: especially in languages where certain combinations really are distinct letters, not just letters with accents or silly marks. I have a Swedish children's book somewhere, in which some characters are harassed by a big ugly

Re: utf - string translation

2006-11-22 Thread Klaas

David H Wild wrote: > In article <[EMAIL PROTECTED]>, >John Machin <[EMAIL PROTECTED]> wrote: > > So why do you want to strip off accents? The history of communication > > has several examples of significant difference in meaning caused by > > minute differences in punctuation or accents includ

Re: utf - string translation

2006-11-22 Thread John Machin

David H Wild wrote: > In article <[EMAIL PROTECTED]>, >John Machin <[EMAIL PROTECTED]> wrote: > > So why do you want to strip off accents? The history of communication > > has several examples of significant difference in meaning caused by > > minute differences in punctuation or accents inclu

Re: utf - string translation

2006-11-22 Thread David H Wild

In article <[EMAIL PROTECTED]>, John Machin <[EMAIL PROTECTED]> wrote: > So why do you want to strip off accents? The history of communication > has several examples of significant difference in meaning caused by > minute differences in punctuation or accents including one of which you > may hav

Re: utf - string translation

2006-11-22 Thread John Machin

Dan wrote: > Thank you for your answers. > > In fact, I'm getting start with Python. That was a good decision. Welcome! > > I was looking for transform a text through elementary cryptographic > processes (Vigenère). So why do you want to strip off accents? The history of communication has severa

Re: utf - string translation

2006-11-22 Thread Dan

Thank you for your answers. In fact, I'm getting start with Python. I was looking for transform a text through elementary cryptographic processes (Vigenère). The initial text is in a file, and my system is under UTF-8 by default (Ubuntu) -- http://mail.python.org/mailman/listinfo/python-list

Re: utf - string translation

2006-11-22 Thread John Machin

hg wrote: > Duncan Booth wrote: > > hg <[EMAIL PROTECTED]> wrote: > > > >>> or in other words, put this at the top of your file (where "utf-8" is > >>> whatever your editor/system is using): > >>> > >>># -*- coding: utf-8 -*- > >>> > >>> and use > >>> > >>>u'' > >>> > >>> for all non-ASCII

Re: utf - string translation

2006-11-22 Thread hg

Fredrik Lundh wrote: > hg wrote: > >> How would you handle the string.maketrans then ? > > maketrans works on bytes, not characters. what makes you think that you > can use maketrans if you haven't gotten the slightest idea what's in the > string? > > if you want to get rid of accents in a Unic

Re: utf - string translation

2006-11-22 Thread Fredrik Lundh

hg wrote: > How would you handle the string.maketrans then ? maketrans works on bytes, not characters. what makes you think that you can use maketrans if you haven't gotten the slightest idea what's in the string? if you want to get rid of accents in a Unicode string, you can do the approach

Re: utf - string translation

2006-11-22 Thread hg

Duncan Booth wrote: > hg <[EMAIL PROTECTED]> wrote: > >>> or in other words, put this at the top of your file (where "utf-8" is >>> whatever your editor/system is using): >>> >>># -*- coding: utf-8 -*- >>> >>> and use >>> >>>u'' >>> >>> for all non-ASCII literals. >>> >>> >>> >> Hi, >> >>

Re: utf - string translation

2006-11-22 Thread Duncan Booth

hg <[EMAIL PROTECTED]> wrote: >> or in other words, put this at the top of your file (where "utf-8" is >> whatever your editor/system is using): >> >># -*- coding: utf-8 -*- >> >> and use >> >>u'' >> >> for all non-ASCII literals. >> >> >> > > Hi, > > The problem is that: > > # -

Re: utf - string translation

2006-11-22 Thread hg

hg wrote: > Fredrik Lundh wrote: >> hg wrote: >> >>> We noticed that len('à') != len('a') >> sounds odd. >> > len('à') == len('a') >> True >> >> are you perhaps using an UTF-8 editor? >> >> to keep your sanity, no matter what editor you're using, I recommend >> adding a coding directive to the

Re: utf - string translation

2006-11-22 Thread hg

Fredrik Lundh wrote: > hg wrote: > >> We noticed that len('à') != len('a') > > sounds odd. > len('à') == len('a') > True > > are you perhaps using an UTF-8 editor? > > to keep your sanity, no matter what editor you're using, I recommend > adding a coding directive to the source file, and

Re: utf - string translation

2006-11-22 Thread Fredrik Lundh

hg wrote: > We noticed that len('à') != len('a') sounds odd. >>> len('à') == len('a') True are you perhaps using an UTF-8 editor? to keep your sanity, no matter what editor you're using, I recommend adding a coding directive to the source file, and using *only* Unicode string literals for n

utf - string translation

2006-11-22 Thread hg

Hi, I'm bringing over a thread that's going on on f.c.l.python. The point was to get rid of french accents from words. We noticed that len('à') != len('a') and I found the hack below to fix the "problem" ... yet I do not understand - especially since 'à' is included in the extended ASCII table,

Re: utf - string translation

Re: utf - string translation

Re: utf - string translation

Re: utf - string translation

Re: utf - string translation

Re: utf - string translation

Re: utf - string translation

Re: utf - string translation

Re: utf - string translation

Re: utf - string translation

Re: utf - string translation

Re: utf - string translation

Re: utf - string translation

Re: utf - string translation

Re: utf - string translation

Re: utf - string translation

Re: utf - string translation

Re: utf - string translation

Re: utf - string translation

Re: utf - string translation

Re: utf - string translation

Re: utf - string translation

utf - string translation

23 matches

Site Navigation

Mail list logo

Footer information