On Sat, Apr 11, 2026 at 08:03:06PM +0100, Gavin Smith wrote:
> > > @documentlanguage sr
> > > @documentscript Latn
> > 
> > That's so rare I doubt a general-purpose facility is justified.  There
> > are maybe half a dozen languages in the whole world that can be
> > written with more than one script.  So using modifiers here is much
> > more reasonable, IMO.

I still think that having a separate @documentscript would be more
in line with the @documentlanguage and @documentencoding separation, and
it appeals to me too, because it makes sense to separate the script and
the language.

Also, I think that it would be better to specify differently the
language variant and the script.

> I tend to agree.  (I know I already changed my mind on this.)  It seems
> simple enough to say that the argument to @documentlanguage follows the
> same format as the language specification for a po file: LL_CC@VARIANT.
> We already have code to strip off a _CC suffix so it would probably be
> easy enough to deal with an @VARIANT suffix similarly.

I prefer the BCP47 approach, as the language variant and the script
are clearly separated.  Also, possibilities for both are specified in
 
https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
with an objective of exhaustivity.

I do not like much how the BCP47 string is formatted, with the use of the
length of elements to determine which is which, it could be problematic
for TeX, I don't know, but how the different aspects of language are
conceptualized seems better to me.

So, if scripts are in the @documentlanguage, I would prefer

@documentlanguage sr_Latn

and to specify both a script and a variant, which cannot be done, as far
as I can say, with @VARIANT:

@documentlanguage sr_Latn_ekavsk

> As discussed, texi2any could transform the @VARIANT to an appropriate
> form for HTML, LaTeX, DocBook or other, if we know how to do this.

It is possible, but, as far as I can say, it is not possible to specify
both a variant and a script.  For example, I do not know how to specify

sr_Latn_ekavsk

with the @VARIANT.  It can be specified if there is a separate
@documentscript, though.

> One question remains, and that is what to do with the contradictory
> alphabets used for the current Serbian translations.  As po/sr.po and
> po_document/sr.po are currently Cyrillic, and those files are more updated
> (due to being translated through the Translation Project), I would guess
> that if we had to choose, that "@documentlanguage sr" should use the
> Cyrillic alphabet, and "@documentlanguage sr@latin" should use the Latin
> alphabet.  So perhaps txi-sr.tex should be renamed [email protected].

Or txi-sr-Latn.tex if BCP47/ISO 15924 is used, which would have my
preference.

> This directory in the glibc sources appears to show the names of many
> of the languages used in glibc locales:
> 
> https://sourceware.org/cgit/glibc/tree/localedata/locales
> 
> Ignoring the @euro variants, it gives an idea how many languages are likely
> to get translations in multiple alphabets.  The number of languages with
> user communities who are likely to produce translations is likely smaller
> still.

The aforementioned
 
https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
is a better source for that.  For instance, there are the Occitan
variants there, which are not of that much importance for manuals, but
make more sense as languages definitions than oc_FR (which is probably
ok for most aspects of locales).  They can be mapped to variants, though,
like oc@lengadoc (the occitan variant where I live).

I think that we should design the @documentlanguage in a way that eases
specifying any language, even for languages that are not important by
the number of person speaking that language or without user communities
likely to produce translations.

-- 
Pat

Reply via email to