Bloom index cost model seems to be wrong

Thomas Kellerer Tue, 12 Feb 2019 07:09:46 -0800

I stumbled upon this question:

    https://dba.stackexchange.com/questions/229413


in a nutshell: the bloom index is not used with the example from the manual. 

The bloom index is only used if either Seq Scan is disabled or if the 
random_page_cost is set to 1 (anything about 1 triggers a Seq Scan on my 
Windows laptop). 

If parallel execution is disabled, then the bloom index is only used if the 
random_page_cost is lower than 4. 

This does not use the index:

  set random_page_cost = 4; 
  set max_parallel_workers_per_gather=0;
  explain (analyze, buffers) 
  select * 
  from tbloom 
  where i2 = 898732 
    and i5 = 123451;

This uses the bloom index:

  set random_page_cost = 3.5; 
  set max_parallel_workers_per_gather=0;
  explain (analyze, buffers) 
  select * 
  from tbloom 
  where i2 = 898732 
    and i5 = 123451;

And this uses the index also: 

  set random_page_cost = 1; 
  explain (analyze, buffers) 
  select * 
  from tbloom 
  where i2 = 898732 
    and i5 = 123451;

This is the plan with when the index is used (either through "enable_seqscan = 
off" or "random_page_cost = 1")

Bitmap Heap Scan on tbloom  (cost=138436.69..138440.70 rows=1 width=24) (actual 
time=42.444..42.444 rows=0 loops=1)      
  Recheck Cond: ((i2 = 898732) AND (i5 = 123451))                               
                                         
  Rows Removed by Index Recheck: 2400                                           
                                         
  Heap Blocks: exact=2365                                                       
                                         
  Buffers: shared hit=21973                                                     
                                         
  ->  Bitmap Index Scan on bloomidx  (cost=0.00..138436.69 rows=1 width=0) 
(actual time=40.756..40.756 rows=2400 loops=1)
        Index Cond: ((i2 = 898732) AND (i5 = 123451))                           
                                         
        Buffers: shared hit=19608                                               
                                         
Planning Time: 0.075 ms                                                         
                                         
Execution Time: 42.531 ms                                                       
                                         

And this is the plan when everything left at default settings:

Seq Scan on tbloom  (cost=0.00..133695.80 rows=1 width=24) (actual 
time=1220.116..1220.116 rows=0 loops=1)
  Filter: ((i2 = 898732) AND (i5 = 123451))                                     
                          
  Rows Removed by Filter: 10000000                                              
                          
  Buffers: shared hit=4697 read=58998                                           
                          
  I/O Timings: read=354.670                                                     
                          
Planning Time: 0.075 ms                                                         
                          
Execution Time: 1220.144 ms                                                     
                          

Can this be considered a bug in the cost model of the bloom index 
implementation? 
Or is it expected that this is only used if random access is really cheap? 

Thomas

Bloom index cost model seems to be wrong

Reply via email to