On 14.02.2020 05:57, Peter Geoghegan wrote:
Attached is v33, which adds the last piece we need: opclass
infrastructure that tells nbtree whether or not deduplication can be
applied safely. This is based on work by Anastasia that was shared
with me privately.
Thank you for this work. I've looked through the patches and they seem
to be ready for commit.
I haven't yet read recent documentation and readme changes, so maybe
I'll send some more feedback tomorrow.
New opclass proc
================
In general, supporting deduplication is the rule for B-Tree opclasses,
rather than the exception. Most can use the generic
btequalimagedatum() routine as their support function 4, which
unconditionally indicates that deduplication is safe. There is a new
test that tries to catch opclasses that omitted to do this. Here is
the opr_sanity.out changes added by the first patch:
-- Almost all Btree opclasses can use the generic btequalimagedatum function
-- as their equalimage proc (support function 4). Look for opclasses that
-- don't do so; newly added Btree opclasses will usually be able to support
-- deduplication with little trouble.
SELECT amproc::regproc AS proc, opf.opfname AS opfamily_name,
opc.opcname AS opclass_name, opc.opcintype::regtype AS opcintype
FROM pg_am am
JOIN pg_opclass opc ON opc.opcmethod = am.oid
JOIN pg_opfamily opf ON opc.opcfamily = opf.oid
LEFT JOIN pg_amproc ON amprocfamily = opf.oid AND
amproclefttype = opcintype AND
amprocnum = 4
WHERE am.amname = 'btree' AND
amproc IS DISTINCT FROM 'btequalimagedatum'::regproc
ORDER BY amproc::regproc::text, opfamily_name, opclass_name;
proc | opfamily_name | opclass_name | opcintype
-------------------+------------------+------------------+------------------
bpchar_equalimage | bpchar_ops | bpchar_ops | character
btnameequalimage | text_ops | name_ops | name
bttextequalimage | text_ops | text_ops | text
bttextequalimage | text_ops | varchar_ops | text
| array_ops | array_ops | anyarray
| enum_ops | enum_ops | anyenum
| float_ops | float4_ops | real
| float_ops | float8_ops | double precision
| jsonb_ops | jsonb_ops | jsonb
| money_ops | money_ops | money
| numeric_ops | numeric_ops | numeric
| range_ops | range_ops | anyrange
| record_image_ops | record_image_ops | record
| record_ops | record_ops | record
| tsquery_ops | tsquery_ops | tsquery
| tsvector_ops | tsvector_ops | tsvector
(16 rows)
Is there any specific reason, why we need separate btnameequalimage,
bpchar_equalimage and bttextequalimage functions?
As far as I see, they have the same implementation.
Since using deduplication is supposed to pretty much be the norm from
now on, it seemed like it might make sense to add a NOTICE about it
during CREATE INDEX -- a notice letting the user know that it isn't
being used due to a lack of opclass support:
regression=# create table foo(bar numeric);
CREATE TABLE
regression=# create index on foo(bar);
NOTICE: index "foo_bar_idx" cannot use deduplication
CREATE INDEX
Note that this NOTICE isn't seen with an INCLUDE index, since that's
expected to not support deduplication.
I have a feeling that not everybody will like this, which is why I'm
pointing it out.
Thoughts?
I would simply move it to debug level for all cases. Since from user's
perspective it doesn't differ that much from the case where
deduplication is applicable in general, but not very efficient due to
data distribution.
I also noticed that this is not consistent with ALTER index. For
example, alter index idx_n set (deduplicate_items =true); won't show any
message about deduplication.
I've tried several combinations with an index on a numeric column:
1) postgres=# create index idx_nd on tbl (n) with (deduplicate_items =
true);
NOTICE: index "idx_nd" cannot use deduplication
CREATE INDEX
Here the message seems appropriate. I don't think, we should restrict
creation of the index even when deduplicate_items parameter is set
explicitly, rather we may warn the user that it won't be efficient.
2) postgres=# create index idx_n on tbl (n) with (deduplicate_items =
false);
NOTICE: index "idx_n" cannot use deduplication
CREATE INDEX
In this case the message seems slightly strange to me.
Why should we show a notice about the fact that deduplication is not
possible if that is exactly what was requested?
3)
postgres=# create index idx on tbl (n);
NOTICE: index "idx" cannot use deduplication
In my opinion, this message is too specific for default behavior. It
exposes internal details without explanation and may look to user like
something went wrong.