This looks interesting. Looking at the costs, clearly bloom index is too costly as per planner's estimate. Just skimming through the code, this line in blcostestimate,
/* We have to visit all index tuples anyway */ costs.numIndexTuples = index->tuples; looks like one of the reasons for this behaviour. On Tue, 8 Oct 2024 at 13:56, Daniel Westermann (DWE) < daniel.westerm...@dbi-services.com> wrote: > Hi, > > previous discussion at [1]. > > Initially I wanted to send this to the docs mailing list, but while > testing this further I think this is more an issue with the planner. > > The example(s) in the docs for bloom filters ( > https://www.postgresql.org/docs/current/bloom.html) cannot be reproduced > at all (and the size of the indexes are far off the truth as well). > > postgres=# select version(); > version > > > ---------------------------------------------------------------------------------------------- > PostgreSQL 18devel on x86_64-pc-linux-gnu, compiled by gcc (Debian > 12.2.0-14) 12.2.0, 64-bit > (1 row) > > postgres=# CREATE TABLE tbloom AS > SELECT > (random() * 1000000)::int as i1, > (random() * 1000000)::int as i2, > (random() * 1000000)::int as i3, > (random() * 1000000)::int as i4, > (random() * 1000000)::int as i5, > (random() * 1000000)::int as i6 > FROM > generate_series(1,10000000); > SELECT 10000000 > postgres=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, > i5, i6); > CREATE INDEX > postgres=# analyze tbloom ; > ANALYZE > postgres=# SELECT pg_size_pretty(pg_relation_size('bloomidx')); > pg_size_pretty > ---------------- > 153 MB > (1 row) > > The following statement (from the docs as well) will never use the bloom > index but will go for parallel execution: > > postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = > 123451; > QUERY PLAN > > > ------------------------------------------------------------------------------------------------------------------------- > Gather (cost=1000.00..127244.10 rows=1 width=24) (actual > time=155.618..155.652 rows=0 loops=1) > Workers Planned: 2 > Workers Launched: 2 > -> Parallel Seq Scan on tbloom (cost=0.00..126244.00 rows=1 width=24) > (actual time=145.353..145.354 rows=0 loops=3) > Filter: ((i2 = 898732) AND (i5 = 123451)) > Rows Removed by Filter: 3333333 > Planning Time: 0.241 ms > Execution Time: 155.673 ms > (8 rows) > > The index will be used if we make parallel execution a lot more costly: > > postgres=# set parallel_setup_cost = 100000; > SET > postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = > 123451; > QUERY PLAN > > > --------------------------------------------------------------------------------------------------------------------------- > Bitmap Heap Scan on tbloom (cost=178436.00..178440.02 rows=1 width=24) > (actual time=31.364..31.365 rows=0 loops=1) > Recheck Cond: ((i2 = 898732) AND (i5 = 123451)) > Rows Removed by Index Recheck: 2424 > Heap Blocks: exact=2372 > -> Bitmap Index Scan on bloomidx (cost=0.00..178436.00 rows=1 > width=0) (actual time=26.387..26.387 rows=2424 loops=1) > Index Cond: ((i2 = 898732) AND (i5 = 123451)) > Planning Time: 0.210 ms > Execution Time: 31.511 ms > (8 rows) > > postgres=# reset parallel_setup_cost ; > RESET > postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = > 123451; > QUERY PLAN > > > ------------------------------------------------------------------------------------------------------------------------- > Gather (cost=1000.00..127244.10 rows=1 width=24) (actual > time=150.802..150.835 rows=0 loops=1) > Workers Planned: 2 > Workers Launched: 2 > -> Parallel Seq Scan on tbloom (cost=0.00..126244.00 rows=1 width=24) > (actual time=143.902..143.903 rows=0 loops=3) > Filter: ((i2 = 898732) AND (i5 = 123451)) > Rows Removed by Filter: 3333333 > Planning Time: 0.172 ms > Execution Time: 150.865 ms > (8 rows) > > The reason I am sending this to this list is the execution time. The plan > using the index is faster than the plan using parallel execution, but it is > not the one executed in the default configuration. Am I missing something? > > Here is another example where the index should clearly win: > > postgres=# create table tdummy as select i as a, i as b, i as c, i as d, i > as e, i as f, i as g, i as h from generate_series (1, 20000000) i; > SELECT 20000000 > postgres=# CREATE INDEX bloomidx2 ON tdummy USING bloom (a,b,c,d,e,f,g,h); > CREATE INDEX > postgres=# analyze tdummy ; > ANALYZE > postgres=# explain analyze select * from tdummy where a = 1 and b = 1 and > c = 1 and d = 1 and e = 1; > QUERY PLAN > > > ------------------------------------------------------------------------------------------------------------------------- > Gather (cost=1000.00..335576.41 rows=1 width=32) (actual > time=0.573..437.776 rows=1 loops=1) > Workers Planned: 2 > Workers Launched: 2 > -> Parallel Seq Scan on tdummy (cost=0.00..334576.31 rows=1 width=32) > (actual time=285.408..430.800 rows=0 loops=3) > Filter: ((a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1)) > Rows Removed by Filter: 6666666 > Planning Time: 0.552 ms > Execution Time: 437.817 ms > (8 rows) > > postgres=# set parallel_setup_cost = 10000; > SET > postgres=# explain analyze select * from tdummy where a = 1 and b = 1 and > c = 1 and d = 1 and e = 1; > QUERY PLAN > > > ------------------------------------------------------------------------------------------------------------------------- > Gather (cost=10000.00..344576.41 rows=1 width=32) (actual > time=0.439..316.780 rows=1 loops=1) > Workers Planned: 2 > Workers Launched: 2 > -> Parallel Seq Scan on tdummy (cost=0.00..334576.31 rows=1 width=32) > (actual time=206.356..311.511 rows=0 loops=3) > Filter: ((a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1)) > Rows Removed by Filter: 6666666 > Planning Time: 0.197 ms > Execution Time: 316.807 ms > (8 rows) > > postgres=# explain analyze select * from tdummy where a = 1 and b = 1 and > c = 1 and d = 1 and e = 1; > QUERY PLAN > > > ------------------------------------------------------------------------------------------------------------------------- > Gather (cost=10000.00..344572.10 rows=1 width=32) (actual > time=1.284..291.015 rows=1 loops=1) > Workers Planned: 2 > Workers Launched: 2 > -> Parallel Seq Scan on tdummy (cost=0.00..334572.00 rows=1 width=32) > (actual time=188.019..284.259 rows=0 loops=3) > Filter: ((a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1)) > Rows Removed by Filter: 6666666 > Planning Time: 0.311 ms > Execution Time: 291.055 ms > (8 rows) > > postgres=# set parallel_setup_cost = 100000; > SET > postgres=# explain analyze select * from tdummy where a = 1 and b = 1 and > c = 1 and d = 1 and e = 1; > QUERY PLAN > > > ------------------------------------------------------------------------------------------------------------------------- > Gather (cost=100000.00..434572.10 rows=1 width=32) (actual > time=0.547..292.466 rows=1 loops=1) > Workers Planned: 2 > Workers Launched: 2 > -> Parallel Seq Scan on tdummy (cost=0.00..334572.00 rows=1 width=32) > (actual time=188.160..285.308 rows=0 loops=3) > Filter: ((a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1)) > Rows Removed by Filter: 6666666 > Planning Time: 0.275 ms > Execution Time: 292.500 ms > (8 rows) > > postgres=# set parallel_setup_cost = 1000000; > SET > postgres=# explain analyze select * from tdummy where a = 1 and b = 1 and > c = 1 and d = 1 and e = 1; > QUERY PLAN > > > ------------------------------------------------------------------------------------------------------------------------- > Bitmap Heap Scan on tdummy (cost=506868.00..506872.02 rows=1 width=32) > (actual time=54.085..54.092 rows=1 loops=1) > Recheck Cond: ((a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1)) > Rows Removed by Index Recheck: 1 > Heap Blocks: exact=2 > -> Bitmap Index Scan on bloomidx2 (cost=0.00..506868.00 rows=1 > width=0) (actual time=54.047..54.047 rows=2 loops=1) > Index Cond: ((a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = > 1)) > Planning Time: 0.165 ms > Execution Time: 54.847 ms > (8 rows) > > Regards > Daniel > > [1] > https://www.postgresql.org/message-id/flat/ZR0P278MB0122119FAE78721A694C30C8D2340%40ZR0P278MB0122.CHEP278.PROD.OUTLOOK.COM > > > -- Regards, Rafia Sabih CYBERTEC PostgreSQL International GmbH