On Mon, Dec 30, 2019 at 9:45 AM Robert Haas <robertmh...@gmail.com> wrote: > > For example, float and numeric types are "never bitwise equal", while array, > > text, and other container types are "maybe bitwise equal". An array of > > integers > > or text with C collation can be treated as bitwise equal attributes, and it > > would be too harsh to restrict them from deduplication.
We might as well support container types (like array) in the first Postgres version that has nbtree deduplication, I suppose. Even still, I don't think that it actually matters much to users. B-Tree indexes on arrays are probably very rare. Note that I don't consider text to be a container type here -- obviously btree/text_ops is a very important opclass for the deduplication feature. It may be the most important opclass overall. Recursively invoking a support function for the "contained" data type in the btree/array_ops support function seems like it might be messy. Not sure about that, though. > > What bothers me is that this option will unlikely be helpful on its own > > and we > > should also provide some kind of recheck function along with opclass, which > > complicates this idea even further and doesn't seem very clear. > > It seems like the simplest thing might be to forget about the 'char' > column and just have a support function which can be used to assess > whether a given opclass's notion of equality is bitwise. I like the idea of relying only on a support function. This approach makes collations a problem that the opclass author has to deal with directly, as is the case within a SortSupport support function. Also seems like it would make life easier for third party data types that want to make use of these optimizations (if in fact there are any). I also see little downside to this approach. The extra cycles shouldn't be noticeable. As far as the B-Tree deduplication logic is concerned, the final boolean value (is deduplication safe?) comes from the index metapage -- we pass that down through an insertion scankey. We only need to determine whether or not the optimization is safe at CREATE INDEX time. (Actually, I don't want to commit to the idea that nbtree should only call this support function at CREATE INDEX time right now. I'm sure that it will hardly ever need to be called, though.) -- Peter Geoghegan