Re: normalization problem with `@anchor` targets

pertusus Mon, 10 Feb 2025 06:11:09 -0800

On Fri, Feb 07, 2025 at 02:10:57PM +0000, Werner LEMBERG wrote:
> > Likewise, Finnish speakers strongly protest if "ä" is rendered as
> > "ae".
> 
> And Japanese people are probably even offended because, as it
> currently happens, almost all Japanese characters are represented with
> Chinese syllables...


In addition to --no-transliterate-file-names, it is also possible to set
the customization variable USE_UNIDECODE to 0, which would prevent
Text::Unicode from being used.  I think that for ä, we do not use
unidecode (I didn't check precisely) to replace the character, so it may
not change the output, but for japanese, with USE_UNIDECODE = 0, the
Text::Unidecode transliterations would not be used.

If someone wants to propose other ways to transliterate, this could be
possible.  Instead of USE_UNIDECODE, there could be a more general
variable, like TRANSLITERATION_METHOD.  We could even add the
possibility to read user-defined transliteration mappings, though I
doubt that it would be used much.

However, any customization of transliteration bears the risk of
cross-references across manuals being non-functional.  That's why it is
necessary to settle on a choice that is good enough but that cannot be
perfect for all the users.  In particular, we cannot really have
different possibilities based on the user locale or @documentlanguage,
because there is no reason for those to be the same where a
cross-reference is generated.  It could be possible to add the
information to htmlxref.cnf.  I doubt that this would be used
much, but we could accept patches for that kind of functionality.

Note that the transliteration may also be different in tests and in
regular output, to get reproducible output.  If C is used, for instance,
iconv //TRANSLIT is used in output (which is actually a risk for
reproducible cross manuals references), while Text::Unidecode or
Text::Unidecode compatible transliterations are used in tests.

-- 
Pat

Re: normalization problem with `@anchor` targets

Reply via email to