Greetings, * Thomas Munro (thomas.mu...@enterprisedb.com) wrote: > On Mon, Sep 17, 2018 at 6:13 AM Douglas Doole <dougdo...@gmail.com> wrote: > > On Sun, Sep 16, 2018 at 1:20 AM Thomas Munro > > <thomas.mu...@enterprisedb.com> wrote: > >> 3. Fix the tracking of when reindexes need to be rebuilt, so that you > >> can't get it wrong (as you're alluding to above). > > > > I've mentioned this in the past, but didn't seem to get any traction, so > > I'll try it again ;-) > > Probably because we agree with you, but don't have all the answers :-)
Agreed. > > The focus on indexes when a collation changes is, in my opinion, the least > > of the problems. You definitely have to worry about indexes, but they can > > be easily rebuilt. What about other places where collation is hardened into > > the system, such as constraints? > > We have to start somewhere and indexes are the first thing that people > notice, and are much likely to actually be a problem (personally I've > encountered many cases of index corruption due to collation changes in > the wild, but never a constraint corruption, though I fully understand > the theoretical concern). Several of us have observed specifically > that the same problems apply to CHECK constraints and PARTITION > boundaries, and there may be other things like that. You could > imagine tracking collation dependencies on those, requiring a RECHECK > or REPARTITION operation to update them after a depended-on collation > version changes. > > Perhaps that suggests that there should be a more general way to store > collation dependencies -- something more like pg_depend, rather than > bolting something like indcollversion onto indexes and every other > kind of catalog that might need it. I don't know. Agreed. If we start thinking about pg_depend then maybe we realize that this all comes back to pg_attribute as the holder of the column-level information and maybe what we should be thinking about is a way to encode version information into the typmod for text-based types... > > And constraints problems are even easier than triggers. Consider a database > > with complex BI rules that are implemented through triggers that fire when > > values are/are not equal. If the equality of strings change, there could be > > bad data throughout the tables. (At least with constraints the inter-column > > dependencies are explicit in the catalogs. With triggers anything goes.) > > Once you get into downstream effects of changes (whether they are > recorded in the database or elsewhere), I think it's basically beyond > our event horizon. Why and when did the collation definition change > (bug fix in CLDR, decree by the Académie Française taking effect on 1 > January 2019, ...)? We could all use bitemporal databases and > multi-version ICU, but at some point it all starts to look like an > episode of Dr Who. I think we should make a clear distinction between > things that invalidate the correct working of the database, and more > nebulous effects that we can't possibly track in general. I tend to agree in general, but I don't think it's beyond us to consider multi-version ICU and being able to perform online reindexing (such that a given system could be migrated from one collation to another over a time while the system is still online, instead of having to take a potentially long downtime hit to rebuild indexes after an upgrade, or having to rebuild the entire system using some kind of logical replication...). Thanks! Stephen
signature.asc
Description: PGP signature