Re: [18] Policy on IMMUTABLE functions and Unicode updates

Peter Eisentraut Tue, 23 Jul 2024 13:36:30 -0700

On 22.07.24 19:55, Robert Haas wrote:

Every other piece of software in the world has to deal with changes as
a result of the addition of new code points, and probably less
commonly, revisions to existing code points. Presumably, their stuff
breaks too, from time to time. I mean, I find it a bit difficult to
believe that web browsers or messaging applications on phones only
ever display emoji, and never try to do any sort of string sorting.

The sorting isn't the problem. We have a versioning mechanism forcollations. What we do with the version information is clearly notperfect yet, but the mechanism exists and you can hack together queriesthat answer the question, did anything change here that would affect myindexes. And you could build more tooling around that and so on.

The problem being considered here are updates to Unicode itself, asdistinct from the collation tables. A Unicode update can impact atleast two things:

- Code points that were previously unassigned are now assigned. That'sobviously a very common thing with every Unicode update. The newcharacter will have new properties attached to it, so the result ofvarious functions that use such properties (upper(), lower(),normalize(), etc.) could change, because previously the code point hadno properties, and so those functions would not do anything interestingwith the character.

- Certain properties of an existing character can change. Like, acharacter used to be a letter and now it's a digit. (This is anexample; I'm not sure if that particular change would be allowed.) Inthe extreme case, this could have the same impact as the above, but inpractice the kinds of changes that are allowed wouldn't affect typicalindexes.

I don't think this has anything in particular to do with the new builtincollation provider. That is just one new consumer of this.

Re: [18] Policy on IMMUTABLE functions and Unicode updates

Reply via email to