On Thu, 11 Nov 2021 at 14:42, Robert Haas <robertmh...@gmail.com> wrote:


> diacritical marks. I know I've seen collation changes on Macs that
> changed the order in which en_US.UTF8 strings sorted. But it wasn't
> that the rules about English sorting have actually changed. It was
> that somebody somewhere decided that the algorithm should be more or
> less case-sensitive, or that we ought to ignore the amount of
> whitespace between words instead of not ignoring it, or I don't know
> exactly, but not anything that people universally agree on. Tinkering
> with obscure rules that actual human beings wouldn't agree on and
> prioritizing that over a stable algorithm is, IMHO, ridiculous.
>

Yes, I thought the point here was to nail down each change as a separate
version. So for example maybe I'm running Universal Compare Everything
Collation v1.2435 while your database is running Universal Compare
Everything Collation v1.2436, with the only difference being whether e
diaresis circumflex comes before or after e circumflex diaresis. If I do a
system upgrade I won't just silently corrupt any indexes with those
characters; instead I'll be told that my collation is out of date and then
I can decide whether to stick with the old collation or rebuild my indexes
and upgrade.

There is however one kind of change at least that I think can be made
safely: adding a new character in between existing characters. That
shouldn't affect any existing indexes.

If the Unicode consortium introduces a new emoji for "annoyed
> PostgreSQL hacker," I really do not care whether that collates before
> or after the existing symbol for "floral heart bullet, reversed
> rotated." I care much more about whether it collates the same way
> after the next minor release as it does the day it's released. And I
> seriously doubt that I am alone in that.
>

Reply via email to