On Wed, 2025-01-15 at 01:40 +0900, Yugo NAGATA wrote: > > > For example, "t ~~ '123foo%'" is converted to "(t >= '123foo' AND > > > t < '123fop')" > > > and index scan can be used for this condition. On the other hand, > > > "t ~~* '123foo'" > > > cannot be converted and sequential scan is used. > > > > > > Even in this case, we can use a bitmap index scan for the > > > condition > > > "(t >= '123f' AND t < '123g') OR "(t >= '123F' AND t < '123G')" > > > followed by > > > recheck by the original condition "t ~~* '123foo'", and this > > > could be faster > > > than seqscan.
In theory, there could be many OR clauses: (t >= '123foo' AND t < '123fop') OR (t >= '123foo' AND t < '123fop') OR (t >= '123foo' AND t < '123fop') OR (t >= '123foo' AND t < '123fop') OR (t >= '123foo' AND t < '123fop') OR (t >= '123foo' AND t < '123fop') OR > > > The patch 0001 enables get_index_clause_from_support to receive > > > OR clauses > > > from support functions and use them to build bitmap index scan > > > later. OR clauses > > > returned by support functions are collected in the new argument > > > 'orclauses" of > > > match_restriction_clauses_to_index(), and they are passed to > > > generate_bitmap_or_paths() later to build bitmap scan paths. If I understand correctly, this > > > The patch 0002 modifies the support function to return OR clauses > > > as an example > > > above when ILIKE's pattern has case-varying characters in forward > > > matching. The > > > OR clauses contains two index conditions for the upper and the > > > lower letter of > > > the first case-varying character in the pattern respectively. > > > Although the > > > subsequent characters are not considered in the index scans, it > > > could enough be > > > faster then sequent scan. > > > > > > Here is an example. > > > > > > 1. Create a table with random text records > > > > > > =# CREATE TABLE tbl (t text); > > > =# INSERT INTO tbl SELECT CASE WHEN i%2=1 THEN upper(x) ELSE x > > > END > > > FROM (SELECT i, md5(i::text) x FROM > > > generate_series(1,5000000) i); > > > > > > 2. Create an index > > > =# CREATE INDEX ON tbl (t text_pattern_ops); > > > > > > 3. Before applying patch: Seq Scan was selected. It takes about 4 > > > sec. > > > > > > =# EXPLIN ANALYZE SELECT * FROM tbl WHERE t ILIKE '12ab%'; > > > > I am sorry, the example in my previous main was wrong. I showed the > > plan > > with enable_index_scan = off. Indead, if the pattern starts with > > case-insensitive characters like '12', an index scan can be used > > even with > > ILIKE. > > > > postgres=# EXPLAIN ANALYZE SELECT * FROM tbl WHERE t ILIKE 'abc%'; > > QUERY > > PLAN > > ------------------------------------------------------------------- > > ------------------------------------------------------ > > Gather (cost=1000.00..69129.61 rows=500 width=33) (actual > > time=36.317..4034.770 rows=1188 loops=1) > > Workers Planned: 2 > > Workers Launched: 2 > > Buffers: shared hit=99 read=41939 > > -> Parallel Seq Scan on tbl (cost=0.00..68079.61 rows=208 > > width=33) (actual time=19.908..4023.668 rows=396 loops=3) > > Filter: (t ~~* 'abc%'::text) > > Rows Removed by Filter: 1666271 > > Buffers: shared hit=99 read=41939 > > Planning Time: 0.726 ms > > Execution Time: 4035.101 ms > > (10 rows) > > > > 4. After applying patch: Bitmap Index Scan was selected. > > > > postgres=# EXPLAIN ANALYZE SELECT * FROM tbl WHERE t ILIKE 'abc%'; > > > > QUERY > > PLAN > > > > ------------------------------------------------------------------- > > ------------------------------------------------------------------- > > ------------------------ > > Bitmap Heap Scan on tbl (cost=12563.66..58314.53 rows=500 > > width=33) (actual time=156.485..1266.789 rows=1188 loops=1) > > Recheck Cond: (((t ~>=~ 'A'::text) AND (t ~<~ 'B'::text) AND (t > > ~~* 'abc%'::text)) OR ((t ~>=~ 'a'::text) AND (t ~<~ 'b'::text) AND > > (t ~~* 'abc%'::text))) > > Filter: (t ~~* 'abc%'::text) > > Rows Removed by Filter: 311473 > > Heap Blocks: exact=42010 > > Buffers: shared hit=1 read=44600 > > -> BitmapOr (cost=12563.66..12563.66 rows=297029 width=0) > > (actual time=136.979..136.980 rows=0 loops=1) > > Buffers: shared hit=1 read=2590 > > -> Bitmap Index Scan on tbl_t_idx (cost=0.00..6281.71 > > rows=148515 width=0) (actual time=80.548..80.549 rows=158375 > > loops=1) > > Index Cond: ((t ~>=~ 'A'::text) AND (t ~<~ > > 'B'::text)) > > Buffers: shared read=1272 > > -> Bitmap Index Scan on tbl_t_idx (cost=0.00..6281.71 > > rows=148515 width=0) (actual time=56.427..56.427 rows=157042 > > loops=1) > > Index Cond: ((t ~>=~ 'a'::text) AND (t ~<~ > > 'b'::text)) > > Buffers: shared hit=1 read=1318 > > Planning Time: 0.592 ms > > Execution Time: 1267.162 ms > > (16 rows) > > > > > > > What do you think about it? > > > > > > I think another application of OR-clause returning support > > > function might be > > > allowing to use an index for NOT LIKE (!~~) expression because, > > > for example, > > > "t !~~ '123foo%'" could be converted to "(t < '123foo' OR t >= > > > '123fooz')". > > > (The second condition is a bit loose but this would be safe and > > > not problem > > > since tuples are filtered by the original condition after the > > > index scan.) > > > However, it would not very useful unless the distribution is much > > > skew so that > > > NOT LIKE expression's selectivity is enough small. > > I added a new patch 0003 that enables ILIKE forward matching to use a > SP-Gist index. > Similar to 0002, this generates BitmapOr Index Scan for two index > clauses that use > "^@" operator for upper letter and lower letter pattern respectively. > > * Before applying the patch: > > postgres=# explain analyze select* from tbl where t ilike 'abc%'; > QUERY > PLAN > --------------------------------------------------------------------- > ------------------------------------------------- > Gather (cost=1000.00..18899.52 rows=101 width=33) (actual > time=41.799..930.382 rows=253 loops=1) > Workers Planned: 2 > Workers Launched: 2 > Buffers: shared hit=96 read=12533 > -> Parallel Seq Scan on tbl (cost=0.00..17889.42 rows=42 > width=33) (actual time=26.041..917.570 rows=84 loops=3) > Filter: (t ~~* 'abc%'::text) > Rows Removed by Filter: 336582 > Buffers: shared hit=96 read=12533 > Planning Time: 0.690 ms > Execution Time: 930.545 ms > (10 rows) > > > * After applying the patch: > > postgres=# explain analyze select* from tbl where t ilike 'abc%'; > QUERY > PLAN > --------------------------------------------------------------------- > ---------------------------------------------------------------- > Bitmap Heap Scan on tbl (cost=3307.96..16702.11 rows=101 width=33) > (actual time=69.449..215.636 rows=253 loops=1) > Recheck Cond: (((t ^@ 'A'::text) AND (t ~~* 'abc%'::text)) OR ((t > ^@ 'a'::text) AND (t ~~* 'abc%'::text))) > Filter: (t ~~* 'abc%'::text) > Rows Removed by Filter: 62992 > Heap Blocks: exact=12437 > Buffers: shared hit=18529 > -> BitmapOr (cost=3307.96..3307.96 rows=61212 width=0) (actual > time=62.567..62.568 rows=0 loops=1) > Buffers: shared hit=6092 > -> Bitmap Index Scan on tbl_t_idx (cost=0.00..1653.96 > rows=30606 width=0) (actual time=41.893..41.893 rows=31536 loops=1) > Index Cond: (t ^@ 'A'::text) > Buffers: shared hit=2461 > -> Bitmap Index Scan on tbl_t_idx (cost=0.00..1653.96 > rows=30606 width=0) (actual time=20.671..20.671 rows=31709 loops=1) > Index Cond: (t ^@ 'a'::text) > Buffers: shared hit=3631 > Planning Time: 1.414 ms > Execution Time: 215.903 ms > (16 rows) > >