On Wed, Aug 14, 2024 at 1:05 PM Jeff Davis <pg...@j-davis.com> wrote: > The only alternative is to essentially ban the use of non-_l variants, > which is fine I suppose, but causes a fair amount of code churn.
Let's zoom out a bit and consider some ways we could set up the process, threads and individual calls: 1. The process global locale is always "C". If you ever call uselocale(), it can only be for short stretches, and you have to restore it straight after; perhaps it is only ever used in replacement _l() functions for systems that lack them. You need to use _l() functions for all non-"C" locales. The current database default needs to be available as a variable (in future: thread-local variable, or reachable from one), so you can use it in _l() functions. The "C" locale can be accessed implicitly with non-l() functions, or you could ban those to reduce confusion and use foo_l(..., LC_GLOBAL_LOCALE) for "C". Or a name like PG_C_LOCALE, which, in backend code could be just LC_GLOBAL_LOCALE, while in frontend/library code it could be the singleton mechanism I showed in CF#5166. XXX Note that nailing LC_ALL to "C" in the backend would extend our existing policy for LC_NUMERIC to all categories. That's why we can use strtod() in the backend and expect the radix character to be '.'. It's interesting to contemplate the strtod() calls in CF#5166: they are code copied-and-pasted from backend and frontend; in the backend we can use strtod() currently but in the frontend code I showed a change to strtod_l(..., PG_C_LOCALE), in order to be able to delete some ugly deeply nested uselocale()/setlocale() stuff of the exact sort that NetBSD hackers (and I) hate. It's obviously a bit of a code smell that it's copied-and-pasted in the first place, and should really share code. Supposing some of that stuff finishes up in src/common, then I think you'd want a strtod_l(..., PG_C_LOCALE) that could be allowed to take advantage of the knowledge that the global locale is "C" in the backend. Just thoughts... 2. The process global locale is always "C". Each backend process (in future: thread) calls uselocale() to set the thread-local locale to the database default, so it can keep using the non-_l() functions as a way to access the database default, and otherwise uses _l() functions if it wants something else (as we do already). The "C" locale can be accessed with foo_l(..., LC_GLOBAL_LOCALE) or PG_C_LOCALE etc. XXX This option is blocked by NetBSD's rejection of uselocale(). I guess if you really wanted to work around NetBSD's policy you could make your own wrapper for all affected functions, translating foo() to foo_l(..., pg_thread_current_locale), so you could write uselocale(), which is pretty much what every other libc does... But eughhh 3. The process global locale is inherited from the system or can be set by the user however they want for the benefit of extensions, but we never change it after startup or refer to it. Then we do the same as 1 or 2, except if we ever want "C" we'll need a locale_t for that, again perhaps using the PC_C_LOCALE mechanism. Non-_l() functions are effectively useless except in cases where you really want to use the system's settings inherited from startup, eg for messages, so they'd mostly be banned. What else? > > They're right that we really just want to use "C" in some places, and > > their LC_C_LOCALE is a very useful system-provided value to be able > > to > > pass into _l functions. It's a shame it's non-standard, because > > without it you have to allocate a locale_t for "C" and keep it > > somewhere to feed to _l functions... > > If we're going to do that, why not just have ascii-only variants of our > own? pg_ascii_isspace(...) is at least as readable as isspace_l(..., > LC_C_LOCALE). Yeah, I agree there are some easy things we should do that way. In fact we've already established that scanner_isspace() needs to be used in lots more places for that, even aside from thread-safety.