On Fri, Feb 07, 2025 at 02:10:57PM +0000, Werner LEMBERG wrote: > > Likewise, Finnish speakers strongly protest if "ä" is rendered as > > "ae". > > And Japanese people are probably even offended because, as it > currently happens, almost all Japanese characters are represented with > Chinese syllables...
In addition to --no-transliterate-file-names, it is also possible to set the customization variable USE_UNIDECODE to 0, which would prevent Text::Unicode from being used. I think that for ä, we do not use unidecode (I didn't check precisely) to replace the character, so it may not change the output, but for japanese, with USE_UNIDECODE = 0, the Text::Unidecode transliterations would not be used. If someone wants to propose other ways to transliterate, this could be possible. Instead of USE_UNIDECODE, there could be a more general variable, like TRANSLITERATION_METHOD. We could even add the possibility to read user-defined transliteration mappings, though I doubt that it would be used much. However, any customization of transliteration bears the risk of cross-references across manuals being non-functional. That's why it is necessary to settle on a choice that is good enough but that cannot be perfect for all the users. In particular, we cannot really have different possibilities based on the user locale or @documentlanguage, because there is no reason for those to be the same where a cross-reference is generated. It could be possible to add the information to htmlxref.cnf. I doubt that this would be used much, but we could accept patches for that kind of functionality. Note that the transliteration may also be different in tests and in regular output, to get reproducible output. If C is used, for instance, iconv //TRANSLIT is used in output (which is actually a risk for reproducible cross manuals references), while Text::Unidecode or Text::Unidecode compatible transliterations are used in tests. -- Pat