On Thu, Feb 06, 2025 at 05:02:30PM +0000, Werner LEMBERG wrote: > > I don't understand. How gets 'ö' mapped to 'o'? It is not NFC, but > it can't be transliteration either, can it? I thought that the idea > of transliteration is to map non-ASCII characters to ASCII strings > unambiguously,
No, the idea is to map to ASCII strings that are not too long and such that looking at the file name, it is possible to imagine what node/anchor it corresponds to at least to some extent. > so I could imagine that 'Bögen' gets mapped to > 'B_oe_gen' or something similar. Stripping off the umlaut dots from > the 'ö' character to convert 'Bögen' to 'Bogen' can never be the right > solution. That is what we do. We do the removal of diacritics ourselves in some cases, but we mainly use Text::Unidecode, or, in C iconv //TRANSLIT https://metacpan.org/pod/Text::Unidecode The objective is not to have a unique filename nor to be perfect, but to have ASCII that is recognizable. > I have no opinion here. IMHO, it should work with or without > `--no-transliterate0file-names`. What I currently see is a strange > Texinfo warning message that implies that I can't use `@anchor{Bogen}` > and `@anchor{Bögen}` at the same time. There are many languages where > you have different words – probably also in French – that become > identical if you strip off the diacritics. We didn't try to have a non ambiguous transliteration, nor a reversible transliteration. My guess is that it is impossible when all the Unicode characters may appear in the same document. And, indeed, there are many such words in French. I used Prés et Près in the test of clashes (and an invented Prês). -- Pat