On Thu, Sep 13, 2018 at 7:03 PM Peter Eisentraut <peter.eisentr...@2ndquadrant.com> wrote: > On 12/09/2018 13:25, Christoph Berg wrote: > > Re: Peter Eisentraut 2018-09-12 > > <0447ec7b-cdb6-7252-7943-88a4664e7...@2ndquadrant.com> > >>> Naive idea: make that catalog shared? Collations are system-wide after > >>> all. > >> > >> By the same argument, extensions should be shared, but they are not. > > > > But extensions put a lot of visible stuff into a database, whereas a > > collation is just a line in some table that doesn't get into the way. > > How about C functions? They are just a system catalog representation of > something that exists on the OS. > > Anyway, we also want to support application-specific collation > definitions, so that users can CREATE COLLATION > "my_specific_requirements" and use that that in their application, so > global collations wouldn't be appropriate for that. > > Moreover, the fix for a collation version mismatch is, in the simplest > case, to go around and REINDEX everything. Making the collation or > collation version global doesn't fix that. It would actually make it > harder because you couldn't run ALTER COLLATION REFRESH VERSION until > after you have rebuilt all affected objects *in all databases*.
Here's one idea I came up with. It involves a new kind of magic. The goals are: 1. Support versioning for the libc provider, including for the default collation. 2. Support ICU for the default collation. 3. Fix the tracking of when reindexes need to be rebuilt, so that you can't get it wrong (as you're alluding to above). Changes: 1. Drop the datcollate and datctype columns from pg_database. 2. In CheckMyDatabase() or elsewhere in backend initialisation, get that information instead by loading the pg_collation row with OID = DEFAULT_COLLATION_OID. 3. Don't put COLLPROVIDER_DEFAULT into the default collation collprovider column, instead give it a concrete provider value, ie COLLPROVIDER_LIBC. 4. After creating a new database, update that row as appropriate in the new database (!). Or find some other way to write a new table out and switch it around, or something like that. That is, if you say CREATE DATABASE foo LC_COLLATE = 'xx_XX', COLLATION_PROVIDER = libc then those values somehow get written into the default pg_collation row in the *new* database (so at that point it's not a simple copy of the template database). 5. Drop the collversion column from pg_collation. Get rid of the REFRESH VERSION command. Instead, add a new column indcollversion to pg_index. It needs to be an array of text (not sure if that is a problem in a catalog), with elements that correspond to the elements of indcollation. 6. Do the check and log warnings when we first open each index. 7. Update indcollversion at index creation and whenever we REINDEX. I haven't actually tried any of this so I'm not sure if I'm missing something other than the inherent difficulty of updating a row in a table in a database you're not connected to... -- Thomas Munro http://www.enterprisedb.com