On Tue, Jul 23, 2024 at 1:03 PM Jeff Davis <pg...@j-davis.com> wrote: > One of my strongest motivations for PG_C_UTF8 was that there was still > a use case for libc in PG16: the "C.UTF-8" locale, which is not > supported at all in ICU. Daniel Vérité made me aware of the importance > of this locale, which offers code point order collation combined with > Unicode ctype semantics. > > With PG17, between ICU and the builtin provider, there's little > remaining reason to use libc (aside from legacy).
I was really interested to read Jeremy Schneider's slide deck, to which he linked earlier, wherein he explained that other major databases default to something more like C.UTF-8. Maybe we need to relitigate the debate about what our default should be in light of those findings (but, if so, on another thread with a clear subject line). But even if we were to decide to change the default, there are lots and lots of existing databases out there that are using libc collations. I'm not in a good position to guess how many of those people actually truly care about language-specific collations. I'm positive it's not zero, but I can't really guess how much more than zero it is. Even if it were zero, though, the fact that so many upgrades are done using pg_upgrade means that this problem will still be around in a decade even if we changed the default tomorrow. (I do understand that you wrote "aside from legacy" so I'm not accusing you of ignoring the upgrade issues, just taking the opportunity to be more explicit about my own view.) Also, Noah has pointed out that C.UTF-8 introduces some forward-compatibility hazards of its own, at least with respect to ctype semantics. I don't have a clear view of what ought to be done about that, but if we just replace a dependency on an unstable set of libc definitions with a dependency on an equally unstable set of PostgreSQL definitions, we're not really winning. Do we need to version the new ctype provider? -- Robert Haas EDB: http://www.enterprisedb.com