Hi!

On Thu, 2024-06-06 at 15:31:55 +0100, Simon McVittie wrote:
> On Thu, 06 Jun 2024 at 13:32:27 +0300, Hakan Bayındır wrote:
> > C, or C.UTF-8 is not a universal locale which works
> > for all.
> 
> Sure, and I don't think anyone is arguing that you or anyone else should
> set the locale for your interactive terminal session, your GUI desktop
> environment, or even your servers to C.UTF-8.
> 
> But, this thread is about build environments for our packages, not about
> runtime environments. We have two-and-a-half possible policies:

> 1. Status quo, in theory:
> 
>    Packages cannot make any assumptions about build-time locales.
> 
>    The benefits are:
> 
>    - Diagnostic messages are in the maintainer's local language, and
>      potentially easier to understand.

I think this is way more important than the relative space used to
mention it though. :) I'm a non-native speaker, who has been involved
in l10n for a long time, while at the same time I've pretty much
always run my systems with either LANG=C.UTF-8 or before that LANG=C,
LC_CTYPE=ca_ES.UTF-8 and LC_COLLATE=ca_ES.UTF-8.

And I think forcing a locale on buildds makes perfect sense, because
we want easy access to build logs. But forcing LC_ALL from the build
tools implies that no tool invoked will get translated messages at
all, and means that users (not just maintainers) might have a harder
time understanding what's going on, we make lots of l10n work rather
pointless, and if no one is running with different locales then l10n
bugs might easily creep in.

>    - If a mass-QA effort wants to assess whether the program is broken by
>      a particular locale, they can easily try running its build-time tests
>      in that locale, **if** the tests do not already force a different
>      locale. (But this comes with some serious limitations: it's likely
>      to have a significant number of false-positive situations where the
>      program is actually working perfectly but the **tests** make assumptions
>      that are not true in all locales, and as a result many upstream
>      projects set their build-time tests to force specific locales
>      anyway - often C, en_US.UTF-8 or C.UTF-8 - regardless of what we
>      might prefer in Debian.)

I consider locale sensitive misbehavior as a category of "upstream"
bugs (be that in the package upstream or the native Debian tools), that
deserve to be spotted and fixed. I can understand though the sentiment
of wanting to shrug this problem category off and wanting instead to
sweep it under the carpet, but that has accessibility consequences.

>   The costs are:

>   - […] but if I'm expected to diagnose the
>     problem by reading Chinese error messages, as a non-Chinese-speaker I
>     am not going to get far.)

Just as an aside, but while getting non-English messages makes for
harder to diagnose bugs, I've never found it a big deal to deal with
that kind of bug reports, as you can grep for (parts of) the
translated message, and then get the original English string from the
.po for example, or can translate the text back to know what it is
talking about, or ask the reported to translate it for you.

> 2½. Unwelcome compromise (increasingly the status quo):
> 
>    Whenever a package is non-reproducible, fails to build or fails tests
>    in certain locales (for example legacy non-UTF-8 locales like C or
>    en_GB.ISO-8859-15), we add `export LC_ALL=C.UTF-8` to debian/rules and
>    move on.
> 
>    This is just (2.) with extra steps, and has the same benefit and cost
>    for the affected packages as (2.) plus an additional cost (someone must
>    identify that the package is in this category and copy/paste the extra
>    line), and the same benefit and costs for unmodified packages as (1.).

I agree though, that if we end up with every debian/rules
unconditionally exporting LC_ALL, then there's not much point in not
making the build driver do it instead.


Related to this, dpkg-buildpackage 1.20.0 gained a --sanitize-env,
which for now on Debian and derivatives sets LC_COLLATE=C.UTF-8 and
umask=0022.

But _iff_ we end up with dpkg-buildpackage being declared the only
supported entry point, _and_ there is consensus that we'd want to set
some kind of locale variable from the build driver, then I guess this
could be done as a Debian vendor-specific thing, or via the
dpkg-build-api(7) interface.

Thanks,
Guillem

Reply via email to