Re: Collation versioning

Stephen Frost Sun, 16 Sep 2018 14:02:46 -0700

Greetings,

* Thomas Munro (thomas.mu...@enterprisedb.com) wrote:
> On Mon, Sep 17, 2018 at 6:13 AM Douglas Doole <dougdo...@gmail.com> wrote:
> > On Sun, Sep 16, 2018 at 1:20 AM Thomas Munro 
> > <thomas.mu...@enterprisedb.com> wrote:
> >> 3.  Fix the tracking of when reindexes need to be rebuilt, so that you
> >> can't get it wrong (as you're alluding to above).
> >
> > I've mentioned this in the past, but didn't seem to get any traction, so 
> > I'll try it again ;-)
> 
> Probably because we agree with you, but don't have all the answers :-)


Agreed.

> > The focus on indexes when a collation changes is, in my opinion, the least 
> > of the problems. You definitely have to worry about indexes, but they can 
> > be easily rebuilt. What about other places where collation is hardened into 
> > the system, such as constraints?
> 
> We have to start somewhere and indexes are the first thing that people
> notice, and are much likely to actually be a problem (personally I've
> encountered many cases of index corruption due to collation changes in
> the wild, but never a constraint corruption, though I fully understand
> the theoretical concern).  Several of us have observed specifically
> that the same problems apply to CHECK constraints and PARTITION
> boundaries, and there may be other things like that.  You could
> imagine tracking collation dependencies on those, requiring a RECHECK
> or REPARTITION operation to update them after a depended-on collation
> version changes.
> 
> Perhaps that suggests that there should be a more general way to store
> collation dependencies -- something more like pg_depend, rather than
> bolting something like indcollversion onto indexes and every other
> kind of catalog that might need it.  I don't know.

Agreed.  If we start thinking about pg_depend then maybe we realize
that this all comes back to pg_attribute as the holder of the
column-level information and maybe what we should be thinking about is a
way to encode version information into the typmod for text-based
types...

> > And constraints problems are even easier than triggers. Consider a database 
> > with complex BI rules that are implemented through triggers that fire when 
> > values are/are not equal. If the equality of strings change, there could be 
> > bad data throughout the tables. (At least with constraints the inter-column 
> > dependencies are explicit in the catalogs. With triggers anything goes.)
> 
> Once you get into downstream effects of changes (whether they are
> recorded in the database or elsewhere), I think it's basically beyond
> our event horizon.  Why and when did the collation definition change
> (bug fix in CLDR, decree by the Académie Française taking effect on 1
> January 2019, ...)?  We could all use bitemporal databases and
> multi-version ICU, but at some point it all starts to look like an
> episode of Dr Who.  I think we should make a clear distinction between
> things that invalidate the correct working of the database, and more
> nebulous effects that we can't possibly track in general.

I tend to agree in general, but I don't think it's beyond us to consider
multi-version ICU and being able to perform online reindexing (such that
a given system could be migrated from one collation to another over a
time while the system is still online, instead of having to take a
potentially long downtime hit to rebuild indexes after an upgrade, or
having to rebuild the entire system using some kind of logical
replication...).

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: Collation versioning

Reply via email to