On Thu, Nov 7, 2019 at 10:20 PM Julien Rouhaud <rjuju...@gmail.com> wrote: > Attached 4th patch handles default collation. I went with an > ignore_systempin flag in recordMultipleDependencies.
Thanks for working on this! I haven't looked closely or tested yet, but this seems reasonable. Obviously it assumes that the default provider is really "libc" in disguise for now, but Peter's other patch will extend that to cover ICU later. > > > * Andres mentioned off-list that pg_depend rows might get blown away > > > and recreated in some DDL circumstances. We need to look into that. > > I tried various flavour of DDL but I couldn't wipe out the pg_depend > rows without having an index rebuild triggered (like changing the > underlying column datatype). Do you have any scenario where the index > rebuild wouldn't be triggered? Ah, OK, if we only do that when the old index contents will also be destroyed, that's great news. > > > * Another is that pg_upgrade won't preserve pg_depend rows, so you'd > > > need some catalog manipulation (direct or via new DDL) to fix that. > > Attached 5th patch add a new "ALTER INDEX idx_name DEPENDS ON > COLLATION coll_oid VERSION coll_version_text" that can only be > executed in binary upgrade mode, and teach pg_dump to generate such > commands (including for indexes created for constraints). It's nice that you were able to make up a reasonable grammar out of existing keywords. I wonder if we should make this user accessible... it could be useful for expert users. If so, maybe it should use collation names, not OIDs? > One issue > is that older versions don't have pg_depend information, so pg_dump > can't find the dependencies to generate such commands and override the > version with anything else. It means that the upgraded cluster will > show all indexes as depending on the current collation provider > version. I'm not sure if that's the best thing to do, or if we should > change all pg_depend rows to mention "unknown" version or something > like that. It would generate so much noise that it's probably not > worth it. Right, so this is basically a policy decision: do we assume that all pre-13 indexes that depend on collations are potentially corrupted, or assume that they are not? The "correct" thing to do would be to assume they are potentially corrupted and complain until the user reindexes, but I think the pragmatic thing to do would be to assume that they're not and just let them adopt the current versions, even though it's a lie. I lean towards the pragmatic choice; we're trying to catch future problems, not give the entire user base a load of extra work to do on their next pg_upgrade for mostly theoretical reasons. (That said, given the new glibc versioning, we'll effectively be giving most of our user base a load of extra work to do on their next OS upgrade and that'll be a characteristic of PostgreSQL going forward, once the versioning-for-default-provider patch goes in.) Any other opinions? > I didn't do anything for the spurious warning when running a reindex, > and kept original approach of pg_depend catalog. I see three options: 1. Change all or some of index_open(), relation_open(), RelationIdGetRelation(), RelationBuildDesc() and RelationInitIndexAccessInfo() to take some kind of flag so we can say NO_COLLATION_VERSION_CHECK_PLEASE, and then have ReindexIndex() pass that flag down when opening it for the purpose of rebuilding it. 2. Use a global state to suppress these warnings while opening that index. Perhaps ReindexIndex() would call RelCacheWarnings(false) before index_open(), and use PG_FINALLY to make sure it restores warnings with RelCacheWarnings(true). (This is a less-code-churn bad-programming-style version of #1.) 3. Move the place that we run the collation version check. Instead of doing it in RelationInitIndexAccessInfo() so that it runs when the relation cache first loads the index, put it into a new routine RelationCheckValidity() and call that from ... I don't know, some other place that runs whenever we access indexes but not when we rebuild them. 3's probably a better idea, if you can find a reasonable place to call it from. I'm thinking some time around the time the executor acquires locks, but using a flag in the relcache entry to make sure it doesn't run the check again if there were no warnings last time (so one successful version check turns the extra work off for the rest of this relcache entry's lifetime). I think it'd be a feature, not a bug, if it gave you the warning every single time you executed a query using an index that has a mismatch... but a practical alternative would be to check only once per index and that probably makes more sense.