Hi! During my work on GSoC project, I found bad perfomace of GiST for point datatype. It appears even on uniform random data.
test=# create table test as (select point(random(), random()) from generate_series(1, 1000000)); SELECT 1000000 test=# create index test_idx on test using gist(point); CREATE INDEX test=# explain (analyze, buffers) select * from test where point <@ box(point(0.5,0.5), point(0.505,0.505)); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Bitmap Heap Scan on test (cost=48.40..2593.73 rows=1000 width=16) (actual time=97.479..97.551 rows=24 loops=1) Recheck Cond: (point <@ '(0.505,0.505),(0.5,0.5)'::box) Buffers: shared hit=5126 -> Bitmap Index Scan on test_idx (cost=0.00..48.15 rows=1000 width=0) (actual time=97.462..97.462 rows=24 loops=1) Index Cond: (point <@ '(0.505,0.505),(0.5,0.5)'::box) Buffers: shared hit=5102 Total runtime: 97.644 ms (7 rows) Search for the cause takes relatively long time from me, but finally I did. In gist_box_penalty function floating point error in line *result = (float) (size_box(ud) - size_box(origentry->key)); sometimes makes *result a very small negative number. I beleive that best place to fix it is gistpenalty function. The attached patch makes this function treating negative number from user's penalty as zero. I didn't find mention of negative penalty value in documentation. So, AFAICS such behaviour shouldn't break anything. After the patch index performance is ok. test=# explain (analyze, buffers) select * from test where point <@ box(point(0.5,0.5), point(0.505,0.505)); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on test (cost=44.35..2589.68 rows=1000 width=16) (actual time=0.988..1.116 rows=24 loops=1) Recheck Cond: (point <@ '(0.505,0.505),(0.5,0.5)'::box) Buffers: shared hit=44 -> Bitmap Index Scan on test_idx (cost=0.00..44.10 rows=1000 width=0) (actual time=0.966..0.966 rows=24 loops=1) Index Cond: (point <@ '(0.505,0.505),(0.5,0.5)'::box) Buffers: shared hit=20 Total runtime: 1.313 ms (7 rows) ------ With best regards, Alexander Korotkov.
*** a/src/backend/access/gist/gistutil.c --- b/src/backend/access/gist/gistutil.c *************** *** 526,536 **** gistpenalty(GISTSTATE *giststate, int attno, --- 526,540 ---- if (giststate->penaltyFn[attno].fn_strict == FALSE || (isNullOrig == FALSE && isNullAdd == FALSE)) + { FunctionCall3Coll(&giststate->penaltyFn[attno], giststate->supportCollation[attno], PointerGetDatum(orig), PointerGetDatum(add), PointerGetDatum(&penalty)); + if (penalty < 0.0) + penalty = 0.0; + } else if (isNullOrig && isNullAdd) penalty = 0.0; else
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers