On Wed, Nov 11, 2020 at 8:34 PM Georgios Kokolatos <gkokola...@protonmail.com> wrote: > > The following review has been posted through the commitfest application: > make installcheck-world: tested, passed > Implements feature: tested, passed > Spec compliant: not tested > Documentation: not tested > > Hi, > > this patch implements a useful and missing feature. Thank you. > > It includes documentation, which to a non-native speaker as myself seems > appropriate. > It includes comprehensive tests that cover the implemented cases. > > In the thread Alexander has pointed out, quote: > "It would be more efficient to generate trigrams for equal operator > using generate_trgm() instead of generate_wildcard_trgm()" > > I will echo the sentiment, though from a slightly different and possibly not > as important point of view. The method used to extract trigrams from the query > should match the method used to extract trigrams from the values when they > get added to the index. This is gin_extract_value_trgm() and is indeed using > generate_trgm(). > > I have no opinion over Alexander's second comment regarding costing. > > I change the status to 'Waiting on Author', but please feel free to override > my opinion if you feel I am wrong and reset it to 'Needs review'.
Thanks for the reminder Georgios! Thanks a lot Alexander for the review! Indeed, I should have used generate_trgm() rather than generate_wildcard_trgm(). IIUC, the rest of the code should still be doing the same as [I]LikeStrategyNumber. I attach a v3 with that modification. For the costing, I tried this naive dataset: CREATE TABLE t1 AS select md5(random()::text) AS val from generate_series(1, 100000); CREATE INDEX t1_btree ON t1 (val); CREATE INDEX t1_gist ON t1 USING gist (val gist_trgm_ops); Cost are like this (all default configuration, using any random existing entry): # EXPLAIN ANALYZE SELECT * FROM t1 where val = '8dcf324ce38428e4d27a363953ac1c51'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Index Only Scan using t1_btree on t1 (cost=0.42..4.44 rows=1 width=33) (actual time=0.192..0.194 rows=1 loops=1) Index Cond: (val = '8dcf324ce38428e4d27a363953ac1c51'::text) Heap Fetches: 0 Planning Time: 0.133 ms Execution Time: 0.222 ms (5 rows) # EXPLAIN ANALYZE SELECT * FROM t1 where val = '8dcf324ce38428e4d27a363953ac1c51'; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Index Scan using t1_gist on t1 (cost=0.28..8.30 rows=1 width=33) (actual time=0.542..2.359 rows=1 loops=1) Index Cond: (val = '8dcf324ce38428e4d27a363953ac1c51'::text) Planning Time: 0.189 ms Execution Time: 2.382 ms (4 rows) # EXPLAIN ANALYZE SELECT * FROM t1 where val = '8dcf324ce38428e4d27a363953ac1c51'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on t1 (cost=400.01..404.02 rows=1 width=33) (actual time=2.486..2.488 rows=1 loops=1) Recheck Cond: (val = '8dcf324ce38428e4d27a363953ac1c51'::text) Heap Blocks: exact=1 -> Bitmap Index Scan on t1_gin (cost=0.00..400.01 rows=1 width=0) (actual time=2.474..2.474 rows=1 loops=1) Index Cond: (val = '8dcf324ce38428e4d27a363953ac1c51'::text) Planning Time: 0.206 ms Execution Time: 2.611 ms So assuming that this dataset is representative enough, costing indeed prefers a btree index over a gist/gin index, which should avoid regression with this change. Gin is however quite off, likely because it's a bitmap index scan rather than an index scan, so gist is preferred in this scenario. That's not ideal, but I'm not sure that there are many people having both gin_trgm_ops and gist_trgm_ops.
v3-0001-Handle-operator-in-pg_trgm.patch
Description: Binary data