Re: Collation versioning

Julien Rouhaud Tue, 18 Feb 2020 05:54:15 -0800

On Mon, Feb 17, 2020 at 5:58 AM Thomas Munro <thomas.mu...@gmail.com> wrote:
>
> On Thu, Feb 13, 2020 at 8:13 AM Julien Rouhaud <rjuju...@gmail.com> wrote:
> > Hearing no complaints on the suggestions, I'm attaching v8 to address that:
> >
> > - pg_dump is now using a binary_upgrade_set_index_coll_version() function
> >   rather than plain DDL
> > - the additional DDL is now of the form:
> >   ALTER INDEX name ALTER COLLATION name REFRESH VERSION
>
> +1
>
> A couple of thoughts:
>
> @@ -1115,21 +1117,44 @@ index_create(Relation heapRelation,
> ...
> +               /*
> +                * Get required distinct dependencies on collations
> for all index keys.
> +                * Collations of directly referenced column in hash
> indexes can be
> +                * skipped is they're deterministic.
> +                */
>                 for (i = 0; i < indexInfo->ii_NumIndexKeyAttrs; i++)
>                 {
> -                       if (OidIsValid(collationObjectId[i]) &&
> -                               collationObjectId[i] != DEFAULT_COLLATION_OID)
> +                       Oid colloid = collationObjectId[i];
> +
> +                       if (OidIsValid(colloid))
>                         {
> -                               referenced.classId = CollationRelationId;
> -                               referenced.objectId = collationObjectId[i];
> -                               referenced.objectSubId = 0;
> +                               if ((indexInfo->ii_Am != HASH_AM_OID) ||
> +
> !get_collation_isdeterministic(colloid))
>
> I still don't like the way catalog/index.c has hard-coded knowledge of
> HASH_AM_OID here.  Although it errs on the side of the assuming that
> there *is* a version dependency (good)


Oh, but it also means that it fails to create a versionless
dependency, which is totally wrong.  What we should do is instead
setup a "track_version" flag to pass down.

It also means that the current way I handled unknown version (empty
string) vs unknown collation lib version (null) will be problematic,
both for runtime check and pg_upgrade.  I think we should record an
empty string for both cases, and keep NULL for when explicitly no
version has to be recorded (whether it's not a dependency on
collation, or because the depender doesn't care about version).  It
also mean that I'm missing regression tests using such an access
method.

> there is already another AM in
> the tree that could safely skip it for deterministic collations AFAIK:
> Bloom indexes.  I suppose that any extension AM that is doing some
> kind of hashing would also like to be able to be able to opt out of
> collation version checking, when that is safe.  The question is how to
> model that in our system...

Oh indeed.

> One way would be for each AM to declare whether it is affected by
> collations; the answer could be yes/maybe (default), no,
> only-non-deterministic-ones.  But that still feels like the wrong
> level, not taking advantage of knowledge about operators.

On the other hand, would it be possible that some AM only supports
collation-dependency-free operators while still internally relying on
a stable sort order?

> A better way might be to make declarations about that sort of thing in
> the catalog, somewhere in the vicinity of the operator classes, or
> maybe just to have hard coded knowledge about operator classes (ie
> making declarations in the manual about what eg hash functions are
> allowed to consult and when), and then check which of those an index
> depends on.  I am not sure what would be best, I'd need to spend some
> time studying the am operator system.

I think this will be required at some point anyway, if we want to
eventually avoid warning about possible corruption when an
expression/where clause isn't depending on the collation ordering.

> Perhaps for the first version of this feature, we should just add a
> new local function
> index_can_skip_collation_version_dependency(indexInfo, colloid) to
> encapsulate your existing logic, but add a comment that in future we
> might be able to support skipping in more cases through analysis of
> the catalogs.

That'd be convenient, but would also break extensibility as bloom
would get a preferential treatment (even if such AM doesn't already
exist).

>
> +   <varlistentry>
> +    <term><literal>ALTER COLLATION</literal></term>
> +    <listitem>
> +     <para>
> +      This command declares that the index is compatible with the currently
> +      installed version of the collations that determine its order.  It is 
> used
> +      to silence warnings caused by collation version incompatibilities and
> +      should be called after rebuilding the index or otherwise verifying its
> +      consistency.  Be aware that incorrect use of this command can hide 
> index
> +      corruption.
> +     </para>
> +    </listitem>
> +   </varlistentry>
>
> This sounds like something that you need to do after you reindex, but
> that's not true, is it?  This is something you can do *instead* of
> reindex, to make it shut up about versions.  Shouldn't it be something
> like "... should be issued only if the ordering is known not to have
> changed since the index was built"?

Indeed.  We should also probably explicitly mention that if the
situation is unknown, REINDEX is the safe alternative to choose.

> +-- Test ALTER INDEX name ALTER COLLATION name REFRESH VERSION
> +UPDATE pg_depend SET refobjversion = 'not a version'
> +WHERE refclassid = 'pg_collation'::regclass
> +AND objid::regclass::text = 'icuidx17_part'
> +AND refobjversion IS NOT NULL;
> +SELECT objid::regclass FROM pg_depend WHERE refobjversion = 'not a version';
> +     objid
> +---------------
> + icuidx17_part
> +(1 row)
> +
> +ALTER INDEX icuidx17_part ALTER COLLATION "en-x-icu" REFRESH VERSION;
> +SELECT objid::regclass FROM pg_depend WHERE refobjversion = 'not a version';
> + objid
> +-------
> +(0 rows)
> +
>
> Would it be better to put refobjversion = 'not a version' in the
> SELECT list, instead of the WHERE clause, with a WHERE clause that
> hits that one row, so that we can see that the row still exists after
> the REFRESH VERSION (while still hiding the unstable version string)?

Agreed, I'll change that.

Re: Collation versioning

Reply via email to