[gentoo-dev] UTF-8 in ebuild descriptions

nlhowell Sat, 04 Jul 2020 07:48:47 -0700

Hello all,

I'm packaging a handful of dictionaries for various languages; some
of them have non-ASCII names in English. Example:
```
DESCRIPTION="Giellatekno morphological dictionaries for Apurinã."
```


Repoman complains:
```
  variable.invalidchar [fatal]  5
   app-dicts/giella-apu/giella-apu-9999.ebuild: DESCRIPTION variable
   contains non-ASCII character at position 50
```

A brief discussion in #gentoo-dev-help found GLEP 31:

> <tastytea> Only ASCII is permitted in code which is parsed by bash
> and output:
> <https://www.gentoo.org/glep/glep-0031.html#ebuild-and-eclass-character-sets>.

>From GLEP31:
> Ebuild and Eclass Character Sets
> 
> For the same reasons as previously, it is proposed that UTF-8 is
> used as the official encoding for ebuild and eclass files.
> 
> However, developers should be warned that any code which is parsed
> by bash (in other words, non-comments), and any output which is
> echoed to the screen (for example, einfo messages) or given to
> portage (for example any of the standard global variables) must not
> use anything outside the regular ASCII 0..127 range for
> compatibility purposes.

What does "compatibility purposes" mean here? Non-unicode locales and
terminal state corruption? Other tools?

I would like the ebuild short description to refer to these languages
by their names, instead of by e.g. ISO-639-3 codes.

Thoughts?

Cheers,
Nick

signature.asc
Description: OpenPGP digital signature

[gentoo-dev] UTF-8 in ebuild descriptions

Reply via email to