Hi!

On Tue, 2024-07-02 at 09:52:05 +0100, Simon McVittie wrote:
> On Tue, 02 Jul 2024 at 03:47:29 +0200, Guillem Jover wrote:
> > On Fri, 2024-06-07 at 15:40:07 +0200, Alexandre Detiste wrote:
> > > Maybe a compromise would be to at least mandate some UTF-8 locale.
> >
> > dpkg-buildpackage: Require an UTF-8 (or ASCII) locale when
> > building packages
> 
> Allowing ASCII seems counterproductive: that puts us in the code path
> where various tools and runtimes (especially Python) will refuse to
> process or output anything outside the 0-127 range, which I believe is
> exactly the problem that debhelper aims to solve by using C.UTF-8 for
> some categories of package (in particular those that build with Meson).
> 
> To get what Alexandre suggested, we'd need to allow UTF-8 but not allow
> ASCII (so for example fr_FR.UTF-8 or C.UTF-8 is fine, but in particular
> the C locale is not).

Err, you are right. I think I implemented this from my recollection of
the thread, trying to enforce as little as possible, and to try to let
users set "translations" to pure ASCII if desired, but that then defeats
the point brought up in the original mail, and the locale setting in
debhelper. I'll amend the PoC commit to only allow UTF-8.

(Also as long as LC_CTYPE is UTF-8 I think it should not matter whether
LC_MESSAGES is non-UTF-8 as the output codeset should still be UTF-8.)

> Or perhaps this pseudocode?
> 
> if (charset != UTF-8) {
>     emit a warning
>     export LC_ALL=C.UTF-8
>     unset LC_CTYPE LC_NUMERIC LC_TIME LC_COLLATE (etc.)
> }

As it stands, I don't think this would be good enough, because it would
introduce an implicit setting in dpkg-buildpackage while it is
currently not the only supported entry point, so packages could still
not rely on this being always set, and it still disables translated
messages.

While erroring out (even when dpkg-buildpackage is still not the only
supported entry point) would not give a full guarantee that a package
build is always done in a UTF-8 locale, it at least forces the caller
(be that a tool or a human) to change the running environment, while
not forcing untranslated messages. I guess this could be made a stronger
guarantee if debhelper switched from unconditionally setting the locale
to performing a similar check and erroring out too (instead of simply
removing the locale setting).


But from your pseudocode, now I realize the check I implemented is
probably too naive, as it should probably at least also check whether
LC_COLLATE is also UTF-8. So I'll try to think how to make it more
robust.

But, I guess I can at least unconditionally set LC_CTYPE=C.UTF-8 when
using --sanitive-env, right away though.

Thanks,
Guillem

Reply via email to