Hi, On 6.06.2024 ÖS 1:08, Johannes Schauer Marin Rodrigues wrote:
Hi,Quoting Simon Richter (2024-06-06 11:32:33)Would it be possible to set in stone that packages are supposed to always be built in an environment where LC_ALL=C.UTF-8, or, in other words, that builders must set LC_ALL=C.UTF-8?This would be the opposite of the current rule. Setting LC_ALL=C in debian/rules is an one-liner. If your package is not reproducible without it, then your package is broken. It can go in with the workaround, but the underlying problem should be fixed at some point. The reproducible builds checker explicitly tests different locales to ensure reproducibility. Adding this requirement would require disabling this check, and thus hide an entire class of bugs from detection.this is one facet of a much bigger discussion (which we've had before). You can argue both ways, depending on how you look at this problem. It is the question of whether we want to: a) debian/rules is supposed to be runnable in a wide variety of environments. If your package FTBFS in a one specific environment, it is the job of d/rules to normalize the environment to cater for the specific needs of the package. b) debian/rules is supposed to be run in a well-defined environment. If your package FTBFS in this normalized environment, then it is the job of d/rules to add the specific needs of the package to d/rules. So the question is whether you either want to have d/rules normalize heterogeneous environments (a) or whether you want d/rules to make a normalized environment specific to the build (b). This is of course a spectrum and I think we currently doing much more of (a).
I agree with Simon here. C, or C.UTF-8 is not a universal locale which works for all. While C.UTF-8 solves character representation part of "The Turkish Test" [0], it doesn't solve capitalization and sorting issues.
In short, Turkish is the reason why some English text has "İ" and "ı" in it, because in Turkish, they're all present (ı, i, I, İ), and their capitalization rules are different (i becomes İ and ı becomes I; i.e. no loss/gain of dot during case changes).
This creates tons of problems with software which are not aware of the issue (Kodi completely breaks for example, and some software needs forced/custom environments to run).
So, all in all, if your software is expected to run in an international environment, and its build/run behavior breaks in an environment is not to its liking, I also argue that the software is broken to begin with. Because when this problem takes hold in a codebase, it is nigh impossible to fix.
So, I think it's better to strive to evolve the software to be a better international citizen rather than give all the software we build an artificially sterile environment, which is iteratively harder and harder to build and maintain.
A question that goes in a similar direction is whether every d/rules that needs it should have to do this: export DPKG_EXPORT_BUILDFLAGS=y include /usr/share/dpkg/buildflags.mk Or whether we should switch the default and require that d/rules is run in an environment (for example as set-up by dpkg-buildpackage) where these variables are set? Going back to the example of LC_ALL=C.UTF-8 and reproducibility: whether or not this "hides" problem depends on the definition of what things are allowed to change between two builds and what constitutes these things has changed already in the past, for example for the build path which is not *not* changed anymore but instead recorded in the buildinfo. The same could be argued for LC_ALL=C.UTF-8 and the environment variables already are part of the buildinfo. So I do not think that there is an easy answer to this question. Thanks! cheers, josch
Cheers, H. [0]: https://blog.codinghorror.com/whats-wrong-with-turkey/
OpenPGP_signature.asc
Description: OpenPGP digital signature