[18] separate collation and ctype versions, and cleanup of pg_database locale fields

Jeff Davis Thu, 25 Jul 2024 13:29:27 -0700

Definitions:

  - collation is text ordering and comparison
  - ctype affects case mapping (e.g. LOWER()) and pattern
    matching/regexes


Currently, there is only one version field, and it represents the
version of the collation. So, if your provider is libc and datcollate
is "C" and datctype is "en_US.utf8", then the datcollversion will
always be NULL. Other providers use datcolllocale, which is only one
field, so it doesn't matter.

Given the discussion here:

https://www.postgresql.org/message-id/1078884.1721762...@sss.pgh.pa.us

it seems like it may be a good idea to version collation and ctype
separately. The ctype version is, more or less, the Unicode version,
and we know what that is for the builtin provider as well as ICU.

(Aside: ICU could theoretically report the same Unicode version and
still make some change that would affect us, but I have not observed
that to be the case. I use exhaustive code point coverage to test that
our Unicode functions return the same results as the corresponding ICU
functions when the Unicode version matches.)

Adding more collation fields is getting to be messy, though, because
they all have to be present in pg_database, as well. It's hard to move
those fields into pg_collation, because that's not a shared catalog, so
that could cause problems with CREATE/ALTER DATABASE. Is it worth
thinking about how we can clean this up, or should we just put up with
the idea that almost half the fields in pg_database will be locale-
related?

Regards,
        Jeff Davis

[18] separate collation and ctype versions, and cleanup of pg_database locale fields

Reply via email to