On Tue, Dec 6, 2022 at 1:22 PM Ronan Dunklau <ronan.dunk...@aiven.io> wrote: > Le vendredi 2 décembre 2022, 13:58:27 CET Ronan Dunklau a écrit : > > Le vendredi 2 décembre 2022, 12:33:33 CET Alexander Korotkov a écrit : > > > Hi, Ronan! > > > Thank you for your patch. Couple of quick questions. > > > 1) What magic number 50.0 stands for? I think we at least should make > > > it a macro. > > > > This is what is used in other tree-descending estimation functions, so I > > used that too. Maybe a DEFAULT_PAGE_CPU_COST macro would work for both ? If > > so I'll separate this into two patches, one introducing the macro for the > > other estimation functions, and this patch for gin. > > The 0001 patch does this. > > > > > > 2) "We only charge one data page for the startup cost" – should this > > > be dependent on number of search entries? > > In fact there was another problem. The current code estimate two different > pathes for fetching data pages: in the case of a partial match, it takes into > account that all the data pages will have to be fetched. So this is is now > taken into account for the CPU cost as well. > > For the regular search, we scale the number of data pages by the number of > search entries.
Now the patch looks good for me. I made some tests. # create extension pg_trgm; # create extension btree_gin; # create table test1 as (select random() as val from generate_series(1,1000000) i); # create index test1_gin_idx on test1 using gin (val); # explain select * from test1 where val between 0.1 and 0.2; QUERY PLAN --------------------------------------------------------------------------------------------- Bitmap Heap Scan on test1 (cost=1186.21..7089.57 rows=98557 width=8) Recheck Cond: ((val >= '0.1'::double precision) AND (val <= '0.2'::double precision)) -> Bitmap Index Scan on test1_gin_idx (cost=0.00..1161.57 rows=98557 width=0) Index Cond: ((val >= '0.1'::double precision) AND (val <= '0.2'::double precision)) (4 rows) # create index test1_btree_idx on test1 using btree (val); # explain select * from test1 where val between 0.1 and 0.2; QUERY PLAN ----------------------------------------------------------------------------------------- Index Only Scan using test1_btree_idx on test1 (cost=0.42..3055.57 rows=98557 width=8) Index Cond: ((val >= '0.1'::double precision) AND (val <= '0.2'::double precision)) (2 rows) Looks reasonable. In this case GIN is much more expensive, because it can't handle range query properly and overlaps two partial matches. # create table test2 as (select 'samplestring' || i as val from generate_series(1,1000000) i); # create index test2_gin_idx on test2 using gin (val); # explain select * from test2 where val = 'samplestring500000'; QUERY PLAN ----------------------------------------------------------------------------- Bitmap Heap Scan on test2 (cost=20.01..24.02 rows=1 width=18) Recheck Cond: (val = 'samplestring500000'::text) -> Bitmap Index Scan on test2_gin_idx (cost=0.00..20.01 rows=1 width=0) Index Cond: (val = 'samplestring500000'::text) (4 rows) # create index test2_btree_idx on test2 using btree (val); # explain select * from test2 where val = 'samplestring500000'; QUERY PLAN ----------------------------------------------------------------------------------- Index Only Scan using test2_btree_idx on test2 (cost=0.42..4.44 rows=1 width=18) Index Cond: (val = 'samplestring500000'::text) (2 rows) This also looks reasonable. GIN is not terribly bad for this case, but B-tree is much cheaper. I'm going to push this and backpatch to all supported versions if no objections. ------ Regards, Alexander Korotkov