Re: WIP: BRIN multi-range indexes

John Naylor Wed, 09 Sep 2020 09:05:26 -0700

On Sat, Sep 5, 2020 at 7:21 PM Tomas Vondra
<[email protected]> wrote:
>
> OK, here is a rebased version. Most of the breakage was due to changes
> to the BRIN sgml docs.


Hi Tomas,

I plan on trying some different queries on different data
distributions to get a sense of when the planner chooses a
multi-minmax index, and whether the choice is good.

Just to start, I used the artificial example in [1], but scaled down a
bit to save time. Config is at the default except for:
shared_buffers = 1GB
random_page_cost = 1.1;
effective_cache_size = 4GB;

create table t (a bigint, b int) with (fillfactor=95);

insert into t select i + 1000*random(), i+1000*random()
  from generate_series(1,10000000) s(i);

update t set a = 1, b = 1 where random() < 0.001;
update t set a = 10000000, b = 10000000 where random() < 0.001;

analyze t;

create index on t using brin (a);
CREATE INDEX
Time: 1631.452 ms (00:01.631)

explain analyze select * from t
  where a between 1923300::int and 1923600::int;

                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on t  (cost=16.10..43180.43 rows=291 width=12)
(actual time=217.770..1131.366 rows=288 loops=1)
   Recheck Cond: ((a >= 1923300) AND (a <= 1923600))
   Rows Removed by Index Recheck: 9999712
   Heap Blocks: lossy=56819
   ->  Bitmap Index Scan on t_a_idx  (cost=0.00..16.03 rows=22595
width=0) (actual time=3.054..3.055 rows=568320 loops=1)
         Index Cond: ((a >= 1923300) AND (a <= 1923600))
 Planning Time: 0.328 ms
 Execution Time: 1131.411 ms
(8 rows)

Now add the multi-minmax:

create index on t using brin (a int8_minmax_multi_ops);
CREATE INDEX
Time: 6521.026 ms (00:06.521)

The first interesting thing is, with both BRIN indexes available, the
planner still chooses the conventional BRIN index. Only when I disable
it, does it choose the multi-minmax index:

explain analyze select * from t
  where a between 1923300::int and 1923600::int;

                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on t  (cost=68.10..43160.86 rows=291 width=12)
(actual time=1.835..4.196 rows=288 loops=1)
   Recheck Cond: ((a >= 1923300) AND (a <= 1923600))
   Rows Removed by Index Recheck: 22240
   Heap Blocks: lossy=128
   ->  Bitmap Index Scan on t_a_idx1  (cost=0.00..68.03 rows=22523
width=0) (actual time=0.691..0.691 rows=1280 loops=1)
         Index Cond: ((a >= 1923300) AND (a <= 1923600))
 Planning Time: 0.250 ms
 Execution Time: 4.244 ms
(8 rows)

I wonder if this is a clue that something in the costing unfairly
penalizes a multi-minmax index. Maybe not enough to matter in
practice, since I wouldn't expect a user to put different kinds of
index on the same column.

The second thing is, with parallel seq scan, the query is faster than
a BRIN bitmap scan, with this pathological data distribution, but the
planner won't choose it unless forced to:

set enable_bitmapscan = 'off';
explain analyze select * from t
  where a between 1923300::int and 1923600::int;
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..120348.10 rows=291 width=12) (actual
time=372.766..380.364 rows=288 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on t  (cost=0.00..119319.00 rows=121
width=12) (actual time=268.326..366.228 rows=96 loops=3)
         Filter: ((a >= 1923300) AND (a <= 1923600))
         Rows Removed by Filter: 3333237
 Planning Time: 0.089 ms
 Execution Time: 380.434 ms
(8 rows)

And just to compare size:

BRIN         32kB
BRIN multi  136kB
Btree       188MB

[1] 
https://www.postgresql.org/message-id/459eef3e-48c7-0f5a-8356-992442a78bb6%402ndquadrant.com

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: WIP: BRIN multi-range indexes

Reply via email to