Package: www.debian.org X-Debbugs-CC: debian-l10n-de...@lists.debian.org,debian-i18n@lists.debian.org Usertag: scripts Severity: wishlist
Hello, To get the list of languages and countries, many of l10n pages (such as podebconf¹ templates and/or po files²), make use (among others) of «dgettext» utility (see related scripts here³⁴): This is fine and a very convenient way to centralize resources, but the current clause used by these commands to display languages is a bit restrictive in the way it just queries for ISO 639-3 domain which is a standard which does not include so called «collective languages» (eg., «ber» for bereber languages). Full explanation can be read here⁵ but in brief, what Wikipedia says is that ISO 639-3 is not a superset of ISO 639-2, so if we just query inside one, we are discarding others: **** While ISO 639-2 includes three-letter identifiers for collective languages, these codes are excluded from ISO 639-3. Hence ISO 639-3 is not a superset of ISO 639-2. **** So, since this commit⁶ done 11 years ago, replacing ISO 639 for ISO 639-3, now some languages are not properly showing their localized names in PO stats pages, because the ISO 639-3 code table does not include them (this is by design), for instance, see the stats page for PO files²: aym — Unknown language bh — Unknown language ber — Unknown language bos_DE — Unknown language bos_ES — Unknown language bos_FI — Unknown language bos_FR — Unknown language bos_HU — Unknown language bos_IT — Unknown language bos_LT — Unknown language bos_NL — Unknown language bos_SV — Unknown language bos_TR — Unknown language (...) There are 65 more languages displayed as «Unknown language» despite they are fully translated in their respective iso-codes packages, and this happens because of this query clause inside «dtc.def»: **** if ($lang_fullname ne '') { $lang_fullname = dgettext("iso_639_3", "$lang_fullname"); } else { return qq(<Unknown_Language>); } **** So I wonder if we can do something to improve this clause to get a more inclusive query that can match a higher number of domain ISO codes, and not limiting the scope to just one standard that discards many others :-) Some questions/thoughts regarding this: Can «dgettext» return/query/output several or more that one ISO standard? That would be great and facilitate things here. As for «dgettext», it looks to me that just returns one string, but maybe some magic can be done before getting «Unknown language» (ie., perform an additional query to ISO 639-2 before). Hope the problematic is clear and we can get a nice solution for this. ¹https://www.debian.org/international/l10n/po-debconf/index.en.html ²https://www.debian.org/international/l10n/po/index.en.html ³https://salsa.debian.org/webmaster-team/webwml/-/blob/master/english/international/l10n/dtc.def ⁴https://salsa.debian.org/webmaster-team/webwml/-/blob/master/english/international/l10n/scripts/fix-files.sh ⁵https://en.wikipedia.org/wiki/ISO_639-3#Collective_languages ⁶https://salsa.debian.org/webmaster-team/webwml/-/commit/2bb96f31eabe559e1b68313efcbaaa7af3be19d4 Kind regards, -- Camaleón