On Sun, 25 Sep 2016 09:08 am, Thomas 'PointedEars' Lahn wrote: > Christian Gollwitzer wrote: > >> Am 17.09.16 um 23:19 schrieb Thomas 'PointedEars' Lahn: >>> Peng Yu wrote: >>>> Hi, I want to convert strings in which the characters with accents >>>> should be converted to the ones without accents. >>> […] >>>> […] >>>> ./main.py Förstemann >>> >>> AFAIK, “ä”, “ö”, and “ü” are not accented characters in any natural >>> language, but characters of their own (umlauts). >>> >>> In particular, I know for certain that they are not accented in Germanic >>> languages. Swedish has been mentioned; I can add my native language, >>> German, to that list. >> >> In German, they are letters, > > If you read more carefully, my point was: In German, umlauts are not > "accented characters".
The umlauts themselves are not. But the combination of vowel-plus-umlaut is surely an "accented character", is it not? If not, what do you call it in German? My understanding is that both officially and popularly, native German speakers consider that the alphabet has 26 letters (same as English), and that "accented characters" including the vowels which take umlauts are not distinct letters of the alphabet but mere variations of the standard vowels. That's to be contrasted to (say) Swedish, where ä and ö are *not* "a and o with an accent/diacritic/umlaut/diaeresis/trema" but distinct letters of the alphabet in their own right. That's different from ü (the "German Y") in Swedish, which is only used for loan words and names of German origin, and *is* considered to be a variant of u. I use the term "accented character" here in the ignorant, non-linguist, English-speaker sense of any letter of the alphabet with "funny dots and squiggles" on it. To people who know what they are talking about, there is a difference between an accent, umlaut, trema, diaeresis and other diacritics, but for the purposes of my question, I'm not too worried about the technical difference between these modifiers, only whether or not they are considered a modifier on a standard letter or not. [...] > And as you have mentioned phone books, in all German-speaking phone books > I have come across so far, “ä” does sort like “ae”, “ö” like “oe”, and “ü” > like “ue” (this is specified in DIN 5007 as “variant 1”). > > (That does not mean, however, that it is a good idea to *convert* those > letters this way. And there is no good reason to; all modern operating > systems, filesystems and name schemes support Unicode.) Alas, if we only needed to deal with modern operating systems, file systems and naming schemes, life would be much easier. But sadly we also have to deal with *old* operating systems, file systems and naming schemes; as well as ASCII-only or other non-Unicode applications, plus keyboards that give the user no obvious or easy way to add "accents" (diacritics etc.) to base letters. See, for example: http://code.activestate.com/recipes/251871-latin1-to-ascii-the-unicode-hammer/ As the author says: "One of my clients gets address data from Europe, but most of their systems cannot handle Latin-1 characters. With all due respect to the umlaut, scharfes s, cedilla, and all the other fine accented characters of Europe, all I needed to do was to prepare addresses for a shipping system." Post offices and freight companies are used to dealing with misspelled addresses. They can usually cope with a few missing accents. -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list