On Mon, Feb 17, 2020 at 5:58 AM Thomas Munro <thomas.mu...@gmail.com> wrote: > > On Thu, Feb 13, 2020 at 8:13 AM Julien Rouhaud <rjuju...@gmail.com> wrote: > > Hearing no complaints on the suggestions, I'm attaching v8 to address that: > > > > - pg_dump is now using a binary_upgrade_set_index_coll_version() function > > rather than plain DDL > > - the additional DDL is now of the form: > > ALTER INDEX name ALTER COLLATION name REFRESH VERSION > > +1 > > A couple of thoughts: > > @@ -1115,21 +1117,44 @@ index_create(Relation heapRelation, > ... > + /* > + * Get required distinct dependencies on collations > for all index keys. > + * Collations of directly referenced column in hash > indexes can be > + * skipped is they're deterministic. > + */ > for (i = 0; i < indexInfo->ii_NumIndexKeyAttrs; i++) > { > - if (OidIsValid(collationObjectId[i]) && > - collationObjectId[i] != DEFAULT_COLLATION_OID) > + Oid colloid = collationObjectId[i]; > + > + if (OidIsValid(colloid)) > { > - referenced.classId = CollationRelationId; > - referenced.objectId = collationObjectId[i]; > - referenced.objectSubId = 0; > + if ((indexInfo->ii_Am != HASH_AM_OID) || > + > !get_collation_isdeterministic(colloid)) > > I still don't like the way catalog/index.c has hard-coded knowledge of > HASH_AM_OID here. Although it errs on the side of the assuming that > there *is* a version dependency (good)
Oh, but it also means that it fails to create a versionless dependency, which is totally wrong. What we should do is instead setup a "track_version" flag to pass down. It also means that the current way I handled unknown version (empty string) vs unknown collation lib version (null) will be problematic, both for runtime check and pg_upgrade. I think we should record an empty string for both cases, and keep NULL for when explicitly no version has to be recorded (whether it's not a dependency on collation, or because the depender doesn't care about version). It also mean that I'm missing regression tests using such an access method. > there is already another AM in > the tree that could safely skip it for deterministic collations AFAIK: > Bloom indexes. I suppose that any extension AM that is doing some > kind of hashing would also like to be able to be able to opt out of > collation version checking, when that is safe. The question is how to > model that in our system... Oh indeed. > One way would be for each AM to declare whether it is affected by > collations; the answer could be yes/maybe (default), no, > only-non-deterministic-ones. But that still feels like the wrong > level, not taking advantage of knowledge about operators. On the other hand, would it be possible that some AM only supports collation-dependency-free operators while still internally relying on a stable sort order? > A better way might be to make declarations about that sort of thing in > the catalog, somewhere in the vicinity of the operator classes, or > maybe just to have hard coded knowledge about operator classes (ie > making declarations in the manual about what eg hash functions are > allowed to consult and when), and then check which of those an index > depends on. I am not sure what would be best, I'd need to spend some > time studying the am operator system. I think this will be required at some point anyway, if we want to eventually avoid warning about possible corruption when an expression/where clause isn't depending on the collation ordering. > Perhaps for the first version of this feature, we should just add a > new local function > index_can_skip_collation_version_dependency(indexInfo, colloid) to > encapsulate your existing logic, but add a comment that in future we > might be able to support skipping in more cases through analysis of > the catalogs. That'd be convenient, but would also break extensibility as bloom would get a preferential treatment (even if such AM doesn't already exist). > > + <varlistentry> > + <term><literal>ALTER COLLATION</literal></term> > + <listitem> > + <para> > + This command declares that the index is compatible with the currently > + installed version of the collations that determine its order. It is > used > + to silence warnings caused by collation version incompatibilities and > + should be called after rebuilding the index or otherwise verifying its > + consistency. Be aware that incorrect use of this command can hide > index > + corruption. > + </para> > + </listitem> > + </varlistentry> > > This sounds like something that you need to do after you reindex, but > that's not true, is it? This is something you can do *instead* of > reindex, to make it shut up about versions. Shouldn't it be something > like "... should be issued only if the ordering is known not to have > changed since the index was built"? Indeed. We should also probably explicitly mention that if the situation is unknown, REINDEX is the safe alternative to choose. > +-- Test ALTER INDEX name ALTER COLLATION name REFRESH VERSION > +UPDATE pg_depend SET refobjversion = 'not a version' > +WHERE refclassid = 'pg_collation'::regclass > +AND objid::regclass::text = 'icuidx17_part' > +AND refobjversion IS NOT NULL; > +SELECT objid::regclass FROM pg_depend WHERE refobjversion = 'not a version'; > + objid > +--------------- > + icuidx17_part > +(1 row) > + > +ALTER INDEX icuidx17_part ALTER COLLATION "en-x-icu" REFRESH VERSION; > +SELECT objid::regclass FROM pg_depend WHERE refobjversion = 'not a version'; > + objid > +------- > +(0 rows) > + > > Would it be better to put refobjversion = 'not a version' in the > SELECT list, instead of the WHERE clause, with a WHERE clause that > hits that one row, so that we can see that the row still exists after > the REFRESH VERSION (while still hiding the unstable version string)? Agreed, I'll change that.