On Wed, Aug 7, 2024 at 7:07 PM Thomas Munro <thomas.mu...@gmail.com> wrote: > On Wed, Aug 7, 2024 at 10:23 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > > Jeff Davis <pg...@j-davis.com> writes: > > > 2. I don't see a good way to canonicalize a locale name, like in > > > check_locale(), which uses the result of setlocale(). > > > > What I can tell you about that is that check_locale's expectation > > that setlocale does any useful canonicalization is mostly wishful > > thinking [1]. On a lot of platforms you just get the input string > > back again. If that's the only thing keeping us on setlocale, > > I think we could drop it. (Perhaps we should do some canonicalization > > of our own instead?) > > +1 > > I know it does something on Windows (we know the EDB installer gives > it strings like "Language,Country" and it converts them to > "Language_Country.Encoding", see various threads about it all going > wrong), but I'm not sure it does anything we actually want to > encourage. I'm hoping we can gradually screw it down so that we only > have sane BCP 47 in the system on that OS, and I don't see why we > wouldn't just use them verbatim.
Some more thoughts on check_locale() and canonicalisation: I doubt the canonicalisation does anything useful on any Unix system, as they're basically just file names. In the case of glibc, the encoding part is munged before opening the file so it tolerates .utf8 or .UTF-8 or .u---T----f------8 on input, but it still returns whatever you gave it so the return value isn't cleaning the input or anything. "" is a problem however... the special value for "native environment" is returned as a real locale name, which we probably still need in places. We could change that to newlocale("") + query instead, but there is a portability pipeline problem getting the name out of it: 1. POSIX only just added getlocalename_l() in 2024[1][2]. 2. Glibc has non-standard nl_langinfo_l(NL_LOCALE_NAME(category), loc). 3. The <xlocale.h> systems (macOS/*BSD) have non-standard querylocale(mask, loc). 4. AFAIK there is no way to do it on pure POSIX 2008 systems. 5. For Windows, there is a completely different thing to get the user's default locale, see CF#3772. The systems in category 4 would in practice be Solaris and (if it comes back) AIX. Given that, we probably just can't go that way soon. So I think the solution could perhaps be something like: in some early startup phase before there are any threads, we nail down all the locale categories to "C" (or whatever we decide on for the permanent global locale), and also query the "" categories and make a copy of them in case anyone wants them later, and then never call setlocale() again. [1] https://pubs.opengroup.org/onlinepubs/9799919799/functions/getlocalename_l.html [2] https://www.austingroupbugs.net/view.php?id=1220