On Sat, Apr 11, 2026 at 08:03:06PM +0100, Gavin Smith wrote: > > > @documentlanguage sr > > > @documentscript Latn > > > > That's so rare I doubt a general-purpose facility is justified. There > > are maybe half a dozen languages in the whole world that can be > > written with more than one script. So using modifiers here is much > > more reasonable, IMO.
I still think that having a separate @documentscript would be more in line with the @documentlanguage and @documentencoding separation, and it appeals to me too, because it makes sense to separate the script and the language. Also, I think that it would be better to specify differently the language variant and the script. > I tend to agree. (I know I already changed my mind on this.) It seems > simple enough to say that the argument to @documentlanguage follows the > same format as the language specification for a po file: LL_CC@VARIANT. > We already have code to strip off a _CC suffix so it would probably be > easy enough to deal with an @VARIANT suffix similarly. I prefer the BCP47 approach, as the language variant and the script are clearly separated. Also, possibilities for both are specified in https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry with an objective of exhaustivity. I do not like much how the BCP47 string is formatted, with the use of the length of elements to determine which is which, it could be problematic for TeX, I don't know, but how the different aspects of language are conceptualized seems better to me. So, if scripts are in the @documentlanguage, I would prefer @documentlanguage sr_Latn and to specify both a script and a variant, which cannot be done, as far as I can say, with @VARIANT: @documentlanguage sr_Latn_ekavsk > As discussed, texi2any could transform the @VARIANT to an appropriate > form for HTML, LaTeX, DocBook or other, if we know how to do this. It is possible, but, as far as I can say, it is not possible to specify both a variant and a script. For example, I do not know how to specify sr_Latn_ekavsk with the @VARIANT. It can be specified if there is a separate @documentscript, though. > One question remains, and that is what to do with the contradictory > alphabets used for the current Serbian translations. As po/sr.po and > po_document/sr.po are currently Cyrillic, and those files are more updated > (due to being translated through the Translation Project), I would guess > that if we had to choose, that "@documentlanguage sr" should use the > Cyrillic alphabet, and "@documentlanguage sr@latin" should use the Latin > alphabet. So perhaps txi-sr.tex should be renamed [email protected]. Or txi-sr-Latn.tex if BCP47/ISO 15924 is used, which would have my preference. > This directory in the glibc sources appears to show the names of many > of the languages used in glibc locales: > > https://sourceware.org/cgit/glibc/tree/localedata/locales > > Ignoring the @euro variants, it gives an idea how many languages are likely > to get translations in multiple alphabets. The number of languages with > user communities who are likely to produce translations is likely smaller > still. The aforementioned https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry is a better source for that. For instance, there are the Occitan variants there, which are not of that much importance for manuals, but make more sense as languages definitions than oc_FR (which is probably ok for most aspects of locales). They can be mapped to variants, though, like oc@lengadoc (the occitan variant where I live). I think that we should design the @documentlanguage in a way that eases specifying any language, even for languages that are not important by the number of person speaking that language or without user communities likely to produce translations. -- Pat
