On Tue, Jul 17, 2018 at 7:02 AM, Ethan Furman <et...@stoneleaf.us> wrote: > On 07/16/2018 01:15 PM, Chris Angelico wrote: >> >> On Tue, Jul 17, 2018 at 4:55 AM, Steven D'Aprano wrote: > > >>> There is nothing special about diacritics such that we ought to treat >>> some combinations like "Ch" (two code points = one character) as "fixed >>> width" while others like "â" (two code points = one character) as >>> "variable width". >> >> >> When you reverse a word, do you treat "ch" and "sh" as one character >> or two? I'm of the opinion that they're single characters, and thus >> this should be "dalokosh": > > > Depends on the language: in Spanish, "ch" is it's own letter (at least it > was when I grew up), so any word containing it should still contain it when > reversed: "chica" would be "acich". >
Yeah. In Russian, "sh" is the single character "ш". I'm of the opinion that, even after being transliterated into English phonetics, that should be treated as a unit. ISO-9 uses "š" rather than "sh", which is an improvement in character correspondence, but your average English speaker is more likely to be able to pronounce "dalokosh" correctly than to figure out "dalokoš". In the same way, I created a magic item in a D&D campaign called "Yasham Burda", even though the more correct spelling would be "Yaşam Burda" or even "Yasam Burda", for the benefit of my monolingual players. But I'd still treat the "sh" as one character. Ain't transliteration fun? ChrisA -- https://mail.python.org/mailman/listinfo/python-list