Julien Rouhaud <rjuju...@gmail.com> writes: > On Thu, Aug 1, 2019 at 4:37 PM Tom Lane <t...@sss.pgh.pa.us> wrote: >> Eyeing this a bit further ... doesn't scanPendingInsert also need >> to honor so->forcedRecheck? Something along the lines of
> I think so. Yeah, it does --- the updated pg_trgm test attached fails if it doesn't. Also, I found that Alexander's concern upthread: >> What would happen when two-columns index have GIN_SEARCH_MODE_DEFAULT >> scan on first column and GIN_SEARCH_MODE_ALL on second? I think even >> if triconsistent() for second column returns GIN_TRUE, we still need >> to recheck to verify second columns is not NULL. is entirely on-point. This patch generates the wrong answer in the case I added to gin.sql below. (The expected output was generated with HEAD and seems correct, but with these code changes, we incorrectly report the row with NULL as matching. So I expect the cfbot is going to complain about the patch in this state.) While I've not attempted to fix that here, I wonder whether we shouldn't fix it by just forcing forcedRecheck to true in any case where we discard an ALL qualifier. That would get rid of all the ugliness around ginFillScanKey, which I'd otherwise really want to refactor to avoid this business of adding and then removing a scan key. It would also get rid of the bit about "XXX Need to use ALL mode instead of EVERYTHING to skip NULLs if ALL mode has been seen", which aside from being ugly seems to be dead wrong for multi-column-index cases. BTW, it's not particularly the fault of this patch, but: what does it even mean to specify GIN_SEARCH_MODE_ALL with a nonzero number of keys? Should we decide to treat that as an error? It doesn't look to me like any of the in-tree opclasses will return such a case, and I'm not at all convinced what the GIN scan code would actually do with it, except that I doubt it matches the documentation. Setting this back to Waiting on Author. regards, tom lane
diff --git a/contrib/pg_trgm/expected/pg_trgm.out b/contrib/pg_trgm/expected/pg_trgm.out index b3e709f..3e5ba9b 100644 --- a/contrib/pg_trgm/expected/pg_trgm.out +++ b/contrib/pg_trgm/expected/pg_trgm.out @@ -3498,6 +3498,68 @@ select count(*) from test_trgm where t ~ '[qwerty]{2}-?[qwerty]{2}'; 1000 (1 row) +-- check handling of indexquals that generate no searchable conditions +explain (costs off) +select count(*) from test_trgm where t like '%99%' and t like '%qwerty%'; + QUERY PLAN +----------------------------------------------------------------------------- + Aggregate + -> Bitmap Heap Scan on test_trgm + Recheck Cond: ((t ~~ '%99%'::text) AND (t ~~ '%qwerty%'::text)) + -> Bitmap Index Scan on trgm_idx + Index Cond: ((t ~~ '%99%'::text) AND (t ~~ '%qwerty%'::text)) +(5 rows) + +select count(*) from test_trgm where t like '%99%' and t like '%qwerty%'; + count +------- + 19 +(1 row) + +explain (costs off) +select count(*) from test_trgm where t like '%99%' and t like '%qw%'; + QUERY PLAN +------------------------------------------------------------------------- + Aggregate + -> Bitmap Heap Scan on test_trgm + Recheck Cond: ((t ~~ '%99%'::text) AND (t ~~ '%qw%'::text)) + -> Bitmap Index Scan on trgm_idx + Index Cond: ((t ~~ '%99%'::text) AND (t ~~ '%qw%'::text)) +(5 rows) + +select count(*) from test_trgm where t like '%99%' and t like '%qw%'; + count +------- + 19 +(1 row) + +-- ensure that pending-list items are handled correctly, too +create temp table t_test_trgm(t text COLLATE "C"); +create index t_trgm_idx on t_test_trgm using gin (t gin_trgm_ops); +insert into t_test_trgm values ('qwerty99'), ('qwerty01'); +explain (costs off) +select count(*) from t_test_trgm where t like '%99%' and t like '%qwerty%'; + QUERY PLAN +----------------------------------------------------------------------------- + Aggregate + -> Bitmap Heap Scan on t_test_trgm + Recheck Cond: ((t ~~ '%99%'::text) AND (t ~~ '%qwerty%'::text)) + -> Bitmap Index Scan on t_trgm_idx + Index Cond: ((t ~~ '%99%'::text) AND (t ~~ '%qwerty%'::text)) +(5 rows) + +select count(*) from t_test_trgm where t like '%99%' and t like '%qwerty%'; + count +------- + 1 +(1 row) + +select count(*) from t_test_trgm where t like '%99%' and t like '%qw%'; + count +------- + 1 +(1 row) + create table test2(t text COLLATE "C"); insert into test2 values ('abcdef'); insert into test2 values ('quark'); diff --git a/contrib/pg_trgm/sql/pg_trgm.sql b/contrib/pg_trgm/sql/pg_trgm.sql index 08459e6..dcfd3c2 100644 --- a/contrib/pg_trgm/sql/pg_trgm.sql +++ b/contrib/pg_trgm/sql/pg_trgm.sql @@ -55,6 +55,22 @@ select t,similarity(t,'gwertyu0988') as sml from test_trgm where t % 'gwertyu098 select t,similarity(t,'gwertyu1988') as sml from test_trgm where t % 'gwertyu1988' order by sml desc, t; select count(*) from test_trgm where t ~ '[qwerty]{2}-?[qwerty]{2}'; +-- check handling of indexquals that generate no searchable conditions +explain (costs off) +select count(*) from test_trgm where t like '%99%' and t like '%qwerty%'; +select count(*) from test_trgm where t like '%99%' and t like '%qwerty%'; +explain (costs off) +select count(*) from test_trgm where t like '%99%' and t like '%qw%'; +select count(*) from test_trgm where t like '%99%' and t like '%qw%'; +-- ensure that pending-list items are handled correctly, too +create temp table t_test_trgm(t text COLLATE "C"); +create index t_trgm_idx on t_test_trgm using gin (t gin_trgm_ops); +insert into t_test_trgm values ('qwerty99'), ('qwerty01'); +explain (costs off) +select count(*) from t_test_trgm where t like '%99%' and t like '%qwerty%'; +select count(*) from t_test_trgm where t like '%99%' and t like '%qwerty%'; +select count(*) from t_test_trgm where t like '%99%' and t like '%qw%'; + create table test2(t text COLLATE "C"); insert into test2 values ('abcdef'); insert into test2 values ('quark'); diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c index b18ae2b..65ed8b2 100644 --- a/src/backend/access/gin/ginget.c +++ b/src/backend/access/gin/ginget.c @@ -1814,7 +1814,7 @@ scanPendingInsert(IndexScanDesc scan, TIDBitmap *tbm, int64 *ntids) * consistent functions. */ oldCtx = MemoryContextSwitchTo(so->tempCtx); - recheck = false; + recheck = so->forcedRecheck; match = true; for (i = 0; i < so->nkeys; i++) @@ -1888,9 +1888,14 @@ gingetbitmap(IndexScanDesc scan, TIDBitmap *tbm) { CHECK_FOR_INTERRUPTS(); + /* Get next item ... */ if (!scanGetItem(scan, iptr, &iptr, &recheck)) break; + /* ... apply forced recheck if required ... */ + recheck |= so->forcedRecheck; + + /* ... and transfer it into bitmap */ if (ItemPointerIsLossyPage(&iptr)) tbm_add_page(tbm, ItemPointerGetBlockNumber(&iptr)); else diff --git a/src/backend/access/gin/ginscan.c b/src/backend/access/gin/ginscan.c index 74d9821..3d5c7e3 100644 --- a/src/backend/access/gin/ginscan.c +++ b/src/backend/access/gin/ginscan.c @@ -140,8 +140,12 @@ ginFillScanKey(GinScanOpaque so, OffsetNumber attnum, uint32 nUserQueryValues = nQueryValues; uint32 i; - /* Non-default search modes add one "hidden" entry to each key */ - if (searchMode != GIN_SEARCH_MODE_DEFAULT) + /* + * Non-default search modes add one "hidden" entry to each key, unless ALL + * key with no entries as those only need to call triConsistentFn + */ + if (searchMode != GIN_SEARCH_MODE_DEFAULT && + !(searchMode == GIN_SEARCH_MODE_ALL && (nQueryValues <= 0))) nQueryValues++; key->nentries = nQueryValues; key->nuserentries = nUserQueryValues; @@ -265,6 +269,7 @@ ginNewScanKey(IndexScanDesc scan) GinScanOpaque so = (GinScanOpaque) scan->opaque; int i; bool hasNullQuery = false; + bool hasSearchAllMode = false; MemoryContext oldCtx; /* @@ -286,6 +291,7 @@ ginNewScanKey(IndexScanDesc scan) palloc(so->allocentries * sizeof(GinScanEntry)); so->isVoidRes = false; + so->forcedRecheck = false; for (i = 0; i < scan->numberOfKeys; i++) { @@ -371,6 +377,18 @@ ginNewScanKey(IndexScanDesc scan) skey->sk_argument, nQueryValues, queryValues, categories, partial_matches, extra_data); + + if (searchMode == GIN_SEARCH_MODE_ALL && nQueryValues <= 0) + { + /* + * Don't emit ALL key with no entries, check only whether + * unconditional recheck is needed. + */ + GinScanKey key = &so->keys[--so->nkeys]; + + hasSearchAllMode = true; + so->forcedRecheck |= key->triConsistentFn(key) != GIN_TRUE; + } } /* @@ -384,6 +402,13 @@ ginNewScanKey(IndexScanDesc scan) InvalidStrategy, GIN_SEARCH_MODE_EVERYTHING, (Datum) 0, 0, NULL, NULL, NULL, NULL); + + /* + * XXX Need to use ALL mode instead of EVERYTHING to skip NULLs if ALL + * mode has been seen. + */ + if (hasSearchAllMode) + so->keys[so->nkeys - 1].scanEntry[0]->searchMode = GIN_SEARCH_MODE_ALL; } /* diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index 7eba59e..1a9d76d 100644 --- a/src/backend/utils/adt/selfuncs.c +++ b/src/backend/utils/adt/selfuncs.c @@ -6326,6 +6326,16 @@ gincost_pattern(IndexOptInfo *index, int indexcol, return false; } + if (nentries <= 0 && searchMode == GIN_SEARCH_MODE_ALL) + { + /* + * GIN does not emit scan entries for empty GIN_SEARCH_MODE_ALL keys, + * and it can avoid full index scan if there are entries from other + * keys, so we can skip setting of 'haveFullScan' flag. + */ + return true; + } + for (i = 0; i < nentries; i++) { /* @@ -6709,7 +6719,7 @@ gincostestimate(PlannerInfo *root, IndexPath *path, double loop_count, return; } - if (counts.haveFullScan || indexQuals == NIL) + if (counts.haveFullScan || indexQuals == NIL || counts.searchEntries <= 0) { /* * Full index scan will be required. We treat this as if every key in diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h index afb3e15..b0251f7 100644 --- a/src/include/access/gin_private.h +++ b/src/include/access/gin_private.h @@ -359,6 +359,7 @@ typedef struct GinScanOpaqueData MemoryContext keyCtx; /* used to hold key and entry data */ bool isVoidRes; /* true if query is unsatisfiable */ + bool forcedRecheck; /* must recheck all returned tuples */ } GinScanOpaqueData; typedef GinScanOpaqueData *GinScanOpaque; diff --git a/src/test/regress/expected/gin.out b/src/test/regress/expected/gin.out index a3911a6..fb0d29c 100644 --- a/src/test/regress/expected/gin.out +++ b/src/test/regress/expected/gin.out @@ -1,7 +1,7 @@ -- -- Test GIN indexes. -- --- There are other tests to test different GIN opclassed. This is for testing +-- There are other tests to test different GIN opclasses. This is for testing -- GIN itself. -- Create and populate a test table with a GIN index. create table gin_test_tbl(i int4[]) with (autovacuum_enabled = off); @@ -35,3 +35,31 @@ insert into gin_test_tbl select array[1, 2, g] from generate_series(1, 1000) g; insert into gin_test_tbl select array[1, 3, g] from generate_series(1, 1000) g; delete from gin_test_tbl where i @> array[2]; vacuum gin_test_tbl; +-- Test optimization of empty queries +create temp table t_gin_test_tbl(i int4[], j int4[]); +create index on t_gin_test_tbl using gin (i, j); +insert into t_gin_test_tbl select array[100,g], array[200,g] +from generate_series(1, 10) g; +insert into t_gin_test_tbl values(array[0,0], null); +set enable_seqscan = off; +explain +select * from t_gin_test_tbl where array[0] <@ i; + QUERY PLAN +-------------------------------------------------------------------------------------- + Bitmap Heap Scan on t_gin_test_tbl (cost=12.03..20.49 rows=4 width=64) + Recheck Cond: ('{0}'::integer[] <@ i) + -> Bitmap Index Scan on t_gin_test_tbl_i_j_idx (cost=0.00..12.03 rows=4 width=0) + Index Cond: (i @> '{0}'::integer[]) +(4 rows) + +select * from t_gin_test_tbl where array[0] <@ i; + i | j +-------+--- + {0,0} | +(1 row) + +select * from t_gin_test_tbl where array[0] <@ i and '{}'::int4[] <@ j; + i | j +---+--- +(0 rows) + diff --git a/src/test/regress/sql/gin.sql b/src/test/regress/sql/gin.sql index c566e9b..aaf9c19 100644 --- a/src/test/regress/sql/gin.sql +++ b/src/test/regress/sql/gin.sql @@ -1,7 +1,7 @@ -- -- Test GIN indexes. -- --- There are other tests to test different GIN opclassed. This is for testing +-- There are other tests to test different GIN opclasses. This is for testing -- GIN itself. -- Create and populate a test table with a GIN index. @@ -34,3 +34,15 @@ insert into gin_test_tbl select array[1, 3, g] from generate_series(1, 1000) g; delete from gin_test_tbl where i @> array[2]; vacuum gin_test_tbl; + +-- Test optimization of empty queries +create temp table t_gin_test_tbl(i int4[], j int4[]); +create index on t_gin_test_tbl using gin (i, j); +insert into t_gin_test_tbl select array[100,g], array[200,g] +from generate_series(1, 10) g; +insert into t_gin_test_tbl values(array[0,0], null); +set enable_seqscan = off; +explain +select * from t_gin_test_tbl where array[0] <@ i; +select * from t_gin_test_tbl where array[0] <@ i; +select * from t_gin_test_tbl where array[0] <@ i and '{}'::int4[] <@ j;