On Tue, 17 Jul 2018 06:15:25 +1000, Chris Angelico wrote: > On Tue, Jul 17, 2018 at 4:55 AM, Steven D'Aprano > <steve+comp.lang.pyt...@pearwood.info> wrote: >> There is nothing special about diacritics such that we ought to treat >> some combinations like "Ch" (two code points = one character) as "fixed >> width" while others like "â" (two code points = one character) as >> "variable width". > > When you reverse a word, do you treat "ch" and "sh" as one character or > two?
In English, "ch" is always two letters of the alphabet. In Welsh and Czech, they can be one or two letters. (I think they will be two letters only in loan words, but I'm not certain about that.) Whether that makes them one or two characters depends on how you define "character". Good luck with finding a universal, objective, unambiguous definition. > I'm of the opinion that they're single characters, and thus this > should be "dalokosh": > > https://wiki.teamfortress.com/wiki/Dalokohs_Bar > > (It's the Russian for "chocolate" - "шоколад" - transliterated to > English/Latin - "šokolad" or "shokolad" - and then reversed.) In English, I think most people would prefer to use a different term for whatever "sh" and "ch" represent than "character". But you make a good point that even in English, we sometimes want to treat two letter combinations as a single unit. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list