On Thu, 06 Jun 2024 at 14:40:23 +0200, Daniel Gröber wrote: > On Thu, Jun 06, 2024 at 11:32:33AM +0200, Simon Richter wrote: > > If your package is not reproducible without it, then your package is > > broken. > > At build-time, if a program doesn't call setlocale before using locale > dependent standard library functions it's probably a reproducibility > hazard.
I think that's the wrong way round: if the program *does* call setlocale(., "") then it's a potential reproducibility hazard, but until/unless it calls setlocale or equivalent, it's documented in setlocale(3) that it runs in the portable (but bad[1]) "C" locale. But if a program that is run during compilation does call setlocale, then it's most likely doing so for a reason - most commonly so that it can emit diagnostic messages in the user's locale, rather than in programmer-English (and advocates of l10n would likely say that it's a bug for a program to emit diagnostic messages *without* having called setlocale(., "") first). It's only a reproducibility hazard if locale-dependent functions are used to parse machine-readable input, or to emit output that ends up in the .deb. Without further context, we cannot know whether locale-sensitive functions are being used correctly or incorrectly, in the same way that we can't tell without context whether a use of strcmp() is correct or whether a related but different function like strcasecmp() was intended. If we want programs to be locale-insensitive during build, there is a well-defined interface for that - namely, setting LC_ALL to (C or) C.UTF-8. If we don't do that, but instead leave locale environment variables set to whatever arbitrary value has been inherited from the caller, then we are effectively saying "we want programs to remain locale-sensitive", and arguably it would be a (wishlist?) bug for those programs to *not* respect the locale environment variables (at least for their diagnostic output). It seems to me that this applies equally to programs that are or aren't typically used during compilation. If a program uses locale-sensitive functions to parse its configuration file or format its output or something like that, then that's often a bug, but it might equally well be working as designed/documented - again, we can't tell which without domain-specific knowledge of the program. smcv [1] unable to output, or in some cases parse, any character outside the 1-127 ASCII range