On Wed, 2025-01-22 at 19:03 +0100, Peter Eisentraut wrote: > Building a collation provider on this came much later. It was > possibly > a mistake how that was done.
It wasn't a mistake. "Stability within a PG major version" was called a *benefit* near the top of the first email on the subject[1]. It was considered a benefit because it offered a level of stability that neither libc nor ICU could offer. As far as I know, it's still considered to be a benefit today by more people than not (e.g. [2]). The concerns about Unicode updates come from a misunderstanding of the level of stability offered in the past: * IMMUTABLE was initially a planner concept[3], which is why it didn't care much about dependence on GUCs for instance. * Expression / predicate indexes rely on immutability to mean something more strict, and for that, dependence on GUCs creates a problem[4]. (Also, partitioning.) * It's hard to make an immutable UDF without a SET search_path clause, but until version 17, that was such a huge performance hit that it was not usable in an expression index. There will be a lot of not-truly- immutable UDFs used in expression indexes for a long time. * Ordinary text indexes rely on the collation libraries to be stable, which is hard to control because they could be updated by the OS. It's barely possible recently to freeze the version of libc[5] without freezing the whole OS version. And if you do manage to freeze both libc and ICU, you are risking missed security fixes. * pg_upgrade implicitly relies on IMMUTABLE to mean something even more strict: stability across major versions. That's a problem for expression indexes on functions like NORMALIZE(). And, if using the optional built-in provider, also a problem for expression indexes on LOWER(), etc. At each moment we took steps that made sense at the time and in context and I am not criticizing any of those steps. The biggest practical problem was unforseen dramatic changes in glibc that broke a lot of text indexes. The rest of the problems are a mix of design issues, feature interactions, and implementation details that were not resolved before the builtin provider existed and still not resolved today. I do not accept the premise that there is a problem with the built-in provider. I didn't throw caution to the wind and neither did the reviewers: you, Daniel, Jeremy, and I did a ton of work to understand, mitigate, and document the risks (along with a lot of help from Thomas's earlier work). Users who opt-in to the built in provider opt- in to occasional controlled changes according to the rather strict Unicode stability policies[6]. These policies mitigate risks dramatically, especially for those using only assigned code points, which can be checked with the SQL function unicode_assigned(). Regards, Jeff Davis [1] https://www.postgresql.org/message-id/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.ca...@j-davis.com [2] https://www.postgresql.org/message-id/3729436.1721322211%40sss.pgh.pa.us [3] https://www.postgresql.org/message-id/3428810.1721160969%40sss.pgh.pa.us [4] CREATE TABLE t(f float4); CREATE UNIQUE INDEX t_idx ON t((f::text)); SET extra_float_digits = 0; INSERT INTO t VALUES (1.23456789); INSERT INTO t VALUES (1.23456789); -- error SET extra_float_digits = 1; INSERT INTO t VALUES (1.23456789); -- success [5] https://github.com/awslabs/compat-collation-for-glibc [6] https://www.unicode.org/policies/stability_policy.html