On Thu, 11 Nov 2021 at 14:42, Robert Haas <robertmh...@gmail.com> wrote:
> diacritical marks. I know I've seen collation changes on Macs that > changed the order in which en_US.UTF8 strings sorted. But it wasn't > that the rules about English sorting have actually changed. It was > that somebody somewhere decided that the algorithm should be more or > less case-sensitive, or that we ought to ignore the amount of > whitespace between words instead of not ignoring it, or I don't know > exactly, but not anything that people universally agree on. Tinkering > with obscure rules that actual human beings wouldn't agree on and > prioritizing that over a stable algorithm is, IMHO, ridiculous. > Yes, I thought the point here was to nail down each change as a separate version. So for example maybe I'm running Universal Compare Everything Collation v1.2435 while your database is running Universal Compare Everything Collation v1.2436, with the only difference being whether e diaresis circumflex comes before or after e circumflex diaresis. If I do a system upgrade I won't just silently corrupt any indexes with those characters; instead I'll be told that my collation is out of date and then I can decide whether to stick with the old collation or rebuild my indexes and upgrade. There is however one kind of change at least that I think can be made safely: adding a new character in between existing characters. That shouldn't affect any existing indexes. If the Unicode consortium introduces a new emoji for "annoyed > PostgreSQL hacker," I really do not care whether that collates before > or after the existing symbol for "floral heart bullet, reversed > rotated." I care much more about whether it collates the same way > after the next minor release as it does the day it's released. And I > seriously doubt that I am alone in that. >