Definitions: - collation is text ordering and comparison - ctype affects case mapping (e.g. LOWER()) and pattern matching/regexes
Currently, there is only one version field, and it represents the version of the collation. So, if your provider is libc and datcollate is "C" and datctype is "en_US.utf8", then the datcollversion will always be NULL. Other providers use datcolllocale, which is only one field, so it doesn't matter. Given the discussion here: https://www.postgresql.org/message-id/1078884.1721762...@sss.pgh.pa.us it seems like it may be a good idea to version collation and ctype separately. The ctype version is, more or less, the Unicode version, and we know what that is for the builtin provider as well as ICU. (Aside: ICU could theoretically report the same Unicode version and still make some change that would affect us, but I have not observed that to be the case. I use exhaustive code point coverage to test that our Unicode functions return the same results as the corresponding ICU functions when the Unicode version matches.) Adding more collation fields is getting to be messy, though, because they all have to be present in pg_database, as well. It's hard to move those fields into pg_collation, because that's not a shared catalog, so that could cause problems with CREATE/ALTER DATABASE. Is it worth thinking about how we can clean this up, or should we just put up with the idea that almost half the fields in pg_database will be locale- related? Regards, Jeff Davis