On Fri, Feb 07, 2025 at 02:10:57PM +0000, Werner LEMBERG wrote: > > >> My initial recalling was that the transliteration in file name was > >> Karl demand. [...] > > > > It makes sense for Ukrainian or other non-Latin alphabet languages > > but not so much for French or German, in my opinion. > > > > German speakers tend to protest, in my experience, if the umlauts > > are dropped from ü, ä and ö, preferring "ue", "ae" and "oe" to "u", > > "a" and "o". > > > > Likewise, Finnish speakers strongly protest if "ä" is rendered as > > "ae". > > And Japanese people are probably even offended because, as it > currently happens, almost all Japanese characters are represented with > Chinese syllables...
I didn't appreciate the point you were making initially and thought you were making some joke about the origin of the Japanese writing system. However, I realise that was not the case. It's the issue discussed in the Text::Unidecode documentation: Another example: for hanzi/kanji/hanja, I have designed Unidecode to transliterate according to the value that that character has in Mandarin (otherwise Cantonese,...). Some users have complained that applying Unidecode to Japanese produces gibberish. https://metacpan.org/pod/Text::Unidecode So it seems to me that this transliteration produces bad results in many cases and should be turned off by default. Users could at least understand why sequences like __00f6 were occuring in output file names, and should have the option to either use transliteration or UTF-8 file names depending on their preferences.