On Wed, 2025-07-16 at 08:29 +0200, Laurenz Albe wrote: > I have a radical proposal: Rather than having "initdb" default to > whatever locale is in the environment, make it default the the > builtin > provider and the C collation. Wherever people need a natural > language > collation, they can say so explicitly.
You bring up a good sub-point, which is that there are actually three builtin locales[1]: C, C.UTF-8, and PG_UNICODE_FAST. All three have exactly the same sorting and equality semantics (memcmp()), and therefore any of them would solve the problems raised in this thread. > Not that I want to present Oracle as an example to follow in general, > but that's how they are doing it, and while I do hear complaints from > Oracle users, I have yet to hear a complaint about the default binary > collation. My understanding was that, while it does binary sort order, it still does Unicode-aware case mapping. If so, that would be closer to the C.UTF-8 locale (Unicode Simple Case Mapping) or the PG_UNICODE_FAST locale (Unicode Full Case Mapping, which includes multi-character mappings like 'ß' to 'SS'). Note that the SQL standard seems to require Unicode Full Case Mapping. Regards, Jeff Davis [1] https://www.postgresql.org/docs/devel/locale.html#LOCALE-PROVIDERS