On Thu, 06 Jun 2024 at 14:40:23 +0200, Daniel Gröber wrote:
> On Thu, Jun 06, 2024 at 11:32:33AM +0200, Simon Richter wrote:
> > If your package is not reproducible without it, then your package is
> > broken.
> 
> At build-time, if a program doesn't call setlocale before using locale
> dependent standard library functions it's probably a reproducibility
> hazard.

I think that's the wrong way round: if the program *does* call
setlocale(., "") then it's a potential reproducibility hazard, but
until/unless it calls setlocale or equivalent, it's documented in
setlocale(3) that it runs in the portable (but bad[1]) "C" locale.

But if a program that is run during compilation does call setlocale, then
it's most likely doing so for a reason - most commonly so that it can emit
diagnostic messages in the user's locale, rather than in programmer-English
(and advocates of l10n would likely say that it's a bug for a program to
emit diagnostic messages *without* having called setlocale(., "") first).
It's only a reproducibility hazard if locale-dependent functions are
used to parse machine-readable input, or to emit output that ends up in
the .deb. Without further context, we cannot know whether locale-sensitive
functions are being used correctly or incorrectly, in the same way that we
can't tell without context whether a use of strcmp() is correct or
whether a related but different function like strcasecmp() was intended.

If we want programs to be locale-insensitive during build, there is a
well-defined interface for that - namely, setting LC_ALL to (C or) C.UTF-8.
If we don't do that, but instead leave locale environment variables set
to whatever arbitrary value has been inherited from the caller, then we
are effectively saying "we want programs to remain locale-sensitive", and
arguably it would be a (wishlist?) bug for those programs to *not*
respect the locale environment variables (at least for their diagnostic
output). It seems to me that this applies equally to programs that are
or aren't typically used during compilation.

If a program uses locale-sensitive functions to parse its configuration
file or format its output or something like that, then that's often a
bug, but it might equally well be working as designed/documented - again,
we can't tell which without domain-specific knowledge of the program.

    smcv

[1] unable to output, or in some cases parse, any character outside the
    1-127 ASCII range

Reply via email to