Re: Fix gin index cost estimation

Alexander Korotkov Sun, 08 Jan 2023 03:45:59 -0800

On Tue, Dec 6, 2022 at 1:22 PM Ronan Dunklau <ronan.dunk...@aiven.io> wrote:
> Le vendredi 2 décembre 2022, 13:58:27 CET Ronan Dunklau a écrit :
> > Le vendredi 2 décembre 2022, 12:33:33 CET Alexander Korotkov a écrit :
> > > Hi, Ronan!
> > > Thank you for your patch.  Couple of quick questions.
> > > 1) What magic number 50.0 stands for?  I think we at least should make
> > > it a macro.
> >
> > This is what is used in other tree-descending  estimation functions, so I
> > used that too. Maybe a DEFAULT_PAGE_CPU_COST macro would work for both ? If
> > so I'll separate this into two patches, one introducing the macro for the
> > other estimation functions, and this patch for gin.
>
> The 0001 patch does this.
>
> >
> > > 2) "We only charge one data page for the startup cost" – should this
> > > be dependent on number of search entries?
>
> In fact there was another problem. The current code estimate two different
> pathes for fetching data pages: in the case of a partial match, it takes into
> account that all the data pages will have to be fetched. So this is is now
> taken into account for the CPU cost as well.
>
> For the regular search, we scale the number of data pages by the number of
> search entries.


Now the patch looks good for me.  I made some tests.

# create extension pg_trgm;
# create extension btree_gin;
# create table test1 as (select random() as val from
generate_series(1,1000000) i);
# create index test1_gin_idx on test1 using gin (val);

# explain select * from test1 where val between 0.1 and 0.2;
                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test1  (cost=1186.21..7089.57 rows=98557 width=8)
   Recheck Cond: ((val >= '0.1'::double precision) AND (val <=
'0.2'::double precision))
   ->  Bitmap Index Scan on test1_gin_idx  (cost=0.00..1161.57
rows=98557 width=0)
         Index Cond: ((val >= '0.1'::double precision) AND (val <=
'0.2'::double precision))
(4 rows)

# create index test1_btree_idx on test1 using btree (val);
# explain select * from test1 where val between 0.1 and 0.2;
                                       QUERY PLAN
-----------------------------------------------------------------------------------------
 Index Only Scan using test1_btree_idx on test1  (cost=0.42..3055.57
rows=98557 width=8)
   Index Cond: ((val >= '0.1'::double precision) AND (val <=
'0.2'::double precision))
(2 rows)

Looks reasonable.  In this case GIN is much more expensive, because it
can't handle range query properly and overlaps two partial matches.

# create table test2 as (select 'samplestring' || i  as val from
generate_series(1,1000000) i);
# create index test2_gin_idx on test2 using gin (val);
# explain select * from test2 where val = 'samplestring500000';
                                 QUERY PLAN
-----------------------------------------------------------------------------
 Bitmap Heap Scan on test2  (cost=20.01..24.02 rows=1 width=18)
   Recheck Cond: (val = 'samplestring500000'::text)
   ->  Bitmap Index Scan on test2_gin_idx  (cost=0.00..20.01 rows=1 width=0)
         Index Cond: (val = 'samplestring500000'::text)
(4 rows)

# create index test2_btree_idx on test2 using btree (val);
# explain select * from test2 where val = 'samplestring500000';
                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Index Only Scan using test2_btree_idx on test2  (cost=0.42..4.44
rows=1 width=18)
   Index Cond: (val = 'samplestring500000'::text)
(2 rows)

This also looks reasonable.  GIN is not terribly bad for this case,
but B-tree is much cheaper.

I'm going to push this and backpatch to all supported versions if no objections.

------
Regards,
Alexander Korotkov

Re: Fix gin index cost estimation

Reply via email to