Hello Thomas, Thanks for looking at it!
On Thu, Dec 12, 2019 at 5:01 AM Thomas Munro <thomas.mu...@gmail.com> wrote: > > We create duplicate pg_depend records: > > [...] > > I wondered if that was harmless That's the assumed behavior of recordMultipleDependencies: /* * Record the Dependency. Note we don't bother to check for * duplicate dependencies; there's no harm in them. */ We could add a check to skip duplicates for the "track_version == true" path, or switch to flags if we want to also skip duplicates in other cases, but it'll make recordMultipleDependencies a little bit more specialised. > but for one thing it causes duplicate warnings: Yes, that should be avoided. > Here's another way to get a duplicate, and in this example you also > get an unnecessary dependency on 100 "default" for this index: > > postgres=# create index on t(x collate "fr_FR") where x > 'helicopter' > collate "fr_FR"; > CREATE INDEX > postgres=# select * from pg_depend where objid = 't_x_idx'::regclass > and refobjversion != ''; > classid | objid | objsubid | refclassid | refobjid | refobjsubid | > refobjversion | deptype > ---------+-------+----------+------------+----------+-------------+---------------+--------- > 1259 | 16460 | 0 | 3456 | 12603 | 0 | > 0:34.0 | n > 1259 | 16460 | 0 | 3456 | 12603 | 0 | > 0:34.0 | n > 1259 | 16460 | 0 | 3456 | 100 | 0 | > 0:34.0 | n > (3 rows) > > Or... maybe 100 should be there, by simple analysis of the x in the > WHERE clause, but it's the same if you write x collate "fr_FR" > > 'helicopter' collate "fr_FR", and in that case there are no > expressions of collation "default" anywhere. Ah good point. That's because expression_tree_walker() will dig into CollateExpr->args and eventually reach the underlying Var. I don't see an easy way to avoid that while still properly recording the required dependency for an even more realistic index such as CREATE INDEX ON t(x COLLATE "fr_FR") WHERE x > ((x COLLATE "en_US" > 'helicopter' COLLATE "en_US")::text) collate "fr_FR"; and for instance not for CREATE INDEX ON t(x COLLATE "fr_FR") WHERE x > ((x COLLATE "en_US" || 'helicopter' COLLATE "en_US")) collate "fr_FR"; > The indirection through composite types works nicely: > > postgres=# create type foo_t as (en text collate "en_CA", fr text > collate "fr_CA"); > CREATE TYPE > postgres=# create table t (foo foo_t); > CREATE TABLE > postgres=# create index on t(foo); > CREATE INDEX > postgres=# select * from pg_depend where objid = 't_foo_idx'::regclass > and refobjversion != ''; > classid | objid | objsubid | refclassid | refobjid | refobjsubid | > refobjversion | deptype > ---------+-------+----------+------------+----------+-------------+---------------+--------- > 1259 | 16444 | 0 | 3456 | 12554 | 0 | > 0:34.0 | n > 1259 | 16444 | 0 | 3456 | 12597 | 0 | > 0:34.0 | n > (2 rows) > > ... but again it shows the extra and technically unnecessary > dependencies (only 12603 "fr_FR" is really needed): > > postgres=# create index on t(((foo).fr collate "fr_FR")); > CREATE INDEX > postgres=# select * from pg_depend where objid = 't_fr_idx'::regclass > and refobjversion != ''; > classid | objid | objsubid | refclassid | refobjid | refobjsubid | > refobjversion | deptype > ---------+-------+----------+------------+----------+-------------+---------------+--------- > 1259 | 16445 | 0 | 3456 | 12603 | 0 | > 0:34.0 | n > 1259 | 16445 | 0 | 3456 | 12597 | 0 | > 0:34.0 | n > 1259 | 16445 | 0 | 3456 | 12554 | 0 | > 0:34.0 | n > (3 rows) Yes :( > I check that nested types are examined recursively, as appropriate. I > also tested domains, arrays, arrays of domains, expressions extracting > an element from an array of a domain with an explicit collation, and > the only problem I could find was more ways to get duplicates. Hmm... > what else is there that can contain a collatable type...? Ranges! > > postgres=# create type myrange as range (subtype = text); > CREATE TYPE > postgres=# drop table t; > DROP TABLE > postgres=# create table t (x myrange); > CREATE TABLE > postgres=# create index on t(x); > CREATE INDEX > postgres=# select * from pg_depend where objid = 't_x_idx'::regclass > and refobjversion != ''; > classid | objid | objsubid | refclassid | refobjid | refobjsubid | > refobjversion | deptype > ---------+-------+----------+------------+----------+-------------+---------------+--------- > (0 rows) > > ... or perhaps, more realistically, a GIST index might actually be > useful for range queries, and we're not capturing the dependency: > > postgres=# create index t_x_idx on t using gist (x); > CREATE INDEX > postgres=# select * from pg_depend where objid = 't_x_idx'::regclass > and refobjversion != ''; > classid | objid | objsubid | refclassid | refobjid | refobjsubid | > refobjversion | deptype > ---------+-------+----------+------------+----------+-------------+---------------+--------- > (0 rows) Good catch :) I fixed it locally and checked that a gist index on a range with a subtype being a composite type does record the required dependencies. > The new syntax "ALTER INDEX i_name DEPENDS ON ANY COLLATION UNKNOWN > VERSION" doesn't sound good to me, it's not "ANY" collation, it's a > specific set of collations that we aren't listing. "ALTER INDEX > i_name DEPENDS ON COLLATION * VERSION UNKNOWN", hrmph, no that's > terrible... I'm not sure what would be better. Mmm, indeed. With a 3rd round in the existing keyword, how about "DEPENDS ON [ ANY ] REFERENCING COLLATION"? The ANY is mostly to avoid the need for plural. > I'm not sure if I like the idea of VACUUM reporting warnings or not. Hmm. Even if I add this in a IsAutoVacuumWorkerProcess? > To state more explicitly what's happening here, we're searching the > expression trees for subexpresions that have a collation as part of > their static type. We don't know which functions or operators are > actually affected by the collation, though. For example, if an > expression says "x IS NOT NULL" and x happens to be a subexpression of > a type with a particular collation, we don't now that this > expression's value can't possibly be affected by the collation version > changing. So, the system will nag you to rebuild an index just > because you mentioned it, even though the index can't be corrupted. > To do better than that, I suppose we'd need declarations in the > catalog to say which functions/operators are collation sensitive. Wouldn't that still be a problem for an absurd expression like WHERE length((val collate "en_US" > 'uh' collate "en_US")::text) > 0 And since we would still have to record a dependency on the collation in such case, we would need to have another magic value to distinguish "unknown" from "cannot cause corruption" collation version.