Re: normalization problem with `@anchor` targets

pertusus Sun, 02 Mar 2025 03:59:52 -0800

On Sun, Mar 02, 2025 at 11:43:37AM +0000, Gavin Smith wrote:
> On Sun, Mar 02, 2025 at 12:27:49PM +0100, pertu...@free.fr wrote:
> Could we look at extending the htmlxref.cnf format?
> 
> As well as mono/chapter/section/node, like:
> 
>      GS = ${G}/software
>      hello mono    ${GS}/hello/manual/hello.html
>      hello chapter ${GS}/hello/manual/html_chapter/
>      hello section ${GS}/hello/manual/html_section/
>      hello node    ${GS}/hello/manual/html_node/
> 
> - there could be suffixed versions giving the transliteration status.
> 
> It could be something like "node.translit" to give the location of
> an online manual split by node, which nodes are named using transliteration:
> 
>      hello node.translit    ${GS}/hello/manual/html_node/


Another option could be to consider that all the split possibilities of
a manual have the same transliteration/link type option, and use another
line like

hello type translit
...

emacs type utf8


and there would be the possibility to set also plain/default/expand to
override a previous entry and reset to the default 

mymanual type default

> If this is the line that is used for links to "hello", then any links
> to that manual would have transliteration applied.
> 
> This would allow only using transliteration for links to external
> manuals that need it.

This would remove the need to have something like
TRANSLITERATE_EXTERNAL_FILE_NAMES and still cater for main types of use,
but TRANSLITERATE_EXTERNAL_FILE_NAMES could still be relevant if a user
wants to override the default for manuals that are not in htmlxref
information.  We could wait for users asking for it, though.

> As below, we should always use Text::Unidecode for transliteration
> if possible.
> 
> > Date: Mon, 10 Feb 2025 15:11:03 +0100
> > From: pertu...@free.fr
> > To: Werner LEMBERG <w...@gnu.org>
> > Cc: gavinsmith0...@gmail.com, bug-texinfo@gnu.org
> > Subject: Re: normalization problem with `@anchor` targets
> > 
> > Note that the transliteration may also be different in tests and in
> > regular output, to get reproducible output.  If C is used, for instance,
> > iconv //TRANSLIT is used in output (which is actually a risk for
> > reproducible cross manuals references), while Text::Unidecode or
> > Text::Unidecode compatible transliterations are used in tests.
> 
> If in future we allow non-ASCII characters in output HTML file names, we
> could also have "node.utf8".
>
> For completeness, there should also be a name for the current default
> - maybe something like "node.plain" or "node.expand" (referencing
> the "HTML Xref Node Name Expansion" spec).

Re: normalization problem with `@anchor` targets

Reply via email to