On Wed, 2023-03-01 at 11:09 +0100, Peter Eisentraut wrote: > When collation support was added to PostgreSQL, we added UCS_BASIC, > since that could easily be mapped to the C locale.
Sorting by codepoint should be encoding-independent (i.e. decode to codepoint first); but the C collation is just strcmp, which is encoding-dependent. So is UCS_BASIC wrong today? (Aside: I wonder whether we should differentiate between the libc provider, which uses strcoll(), and the provider of non-localized comparisons that just use strcmp(). That would be a better reflection of what the code actually does.) > With ICU support, we can provide the UNICODE collation, since it's > just > the root locale. +1 > I suppose one hesitation was that ICU was not a > standard feature, so this would create variations in the default > catalog > contents, or something like that. It looks like the way you've handled this is by inserting the collation with collprovider=icu even if built without ICU support. I think that's a new case, so we need to make sure it throws reasonable user-facing errors. I do like your approach though because, if someone is using a standard collation, I think "not built with ICU" (feature not supported) is a better error than "collation doesn't exist". It also effectively reserves the name "unicode". -- Jeff Davis PostgreSQL Contributor Team - AWS