[HACKERS] Estimates not taking null_frac element into account with @@ operator? (8.4 .. git-head)

Jesper Krogh Thu, 17 Feb 2011 13:25:32 -0800

Hi All.

The NULL element always suprises me in unpleasant ways..  my brain simply
cant really understand the logic, so please let me know if this is

one of the cases where I just should spend way more efforts into fixingthat instead.


I have a table with a "null_frac" of 0.5 and i have tested that a where
clause that evaluates to null isnt included in the result:

testdb=# select id from testtable where  null @@ to_tsquery('testterm80');
 id
----
(0 rows)

Then I'd expect to have the null_fraq taken into account when computing the
estimates for the query:

testdb=# explain select id from testtable where fts @@to_tsquery('testterm80');

                          QUERY PLAN
---------------------------------------------------------------
 Seq Scan on testtable  (cost=0.00..1985.03 rows=1966 width=4)
   Filter: (fts @@ to_tsquery('testterm80'::text))
(2 rows)

Whereas it actually does it if I explicitly add the "fts is not null"clause to the query.

testdb=# explain select id from testtable where fts @@to_tsquery('testterm80') and fts is not null;

                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Bitmap Heap Scan on testtable  (cost=130.34..1735.19 rows=983 width=4)

Recheck Cond: ((fts @@ to_tsquery('testterm80'::text)) AND (fts ISNOT NULL))-> Bitmap Index Scan on testtable_fts_idx (cost=0.00..130.09rows=983 width=0)Index Cond: ((fts @@ to_tsquery('testterm80'::text)) AND (ftsIS NOT NULL))

(4 rows)

When something evaluates to "null" isn't included in the result,shouldn't the query-planner

then take the null_frac into account when computing the estimate?

Trying to do the same thing using integers and the < operator seem totake the null_frac into

account.

Below snippet allows to reproduce the dataset.


create table testtable (id serial primary key, document text, fts tsvector);
create index on testtable using gist(fts);
CREATE OR REPLACE FUNCTION public.filltable(rows integer)
 RETURNS boolean
 LANGUAGE plpgsql
AS $function$
    DECLARE
        count integer;
    BEGIN
        count := 0;
        LOOP
            EXIT WHEN count = rows;
            count := count +1;

insert into testtable(document,fts) selectdocument,to_tsvector('english',document) from (selectstring_agg(concat,' ') as document from (select concat('testterm' ||generate_series(1,floor(random()*100)::integer))) as foo) as bar;

        END LOOP;
        RETURN TRUE;
    END;
$function$

select filltable(10000);
testdb=# update testtable set fts = null where id % 2 = 0;
UPDATE 5001
testdb=# ANALYZE verbose testtable;
INFO:  analyzing "public.testtable"

INFO: "testtable": scanned 1835 of 1835 pages, containing 10002 liverows and 5001 dead rows; 10002 rows in sample, 10002 estimated total rows

ANALYZE
testdb=# select null_frac from pg_stats where attname = 'fts';
 null_frac
-----------
       0.5
(1 row)

... trying with integers:

testdb=# ALTER  TABLE testtable add column testint integer;
ALTER TABLE
testdb=# update testtable set testint = floor(random()*100);
UPDATE 10002
testdb=# ANALYZE verbose testtable;
INFO:  analyzing "public.testtable"

INFO: "testtable": scanned 2186 of 2186 pages, containing 10002 liverows and 10002 dead rows; 10002 rows in sample, 10002 estimated total rows

ANALYZE
testdb=# update testtable set testint = null where id %2 = 0;
UPDATE 5001
testdb=# ANALYZE verbose testtable;
INFO:  analyzing "public.testtable"

INFO: "testtable": scanned 2282 of 2282 pages, containing 10002 liverows and 13335 dead rows; 10002 rows in sample, 10002 estimated total rows

analyzze ANALYZE
testdb=# explain select id from testtable where testint = 50;
                         QUERY PLAN
-------------------------------------------------------------
 Seq Scan on testtable  (cost=0.00..2407.03 rows=64 width=4)
   Filter: (testint = 50)
(2 rows)

testdb=# explain select id from testtable where testint = 1;
                         QUERY PLAN
-------------------------------------------------------------
 Seq Scan on testtable  (cost=0.00..2407.03 rows=48 width=4)
   Filter: (testint = 1)
(2 rows)

testdb=# explain select id from testtable where testint < 50;
                          QUERY PLAN
---------------------------------------------------------------
 Seq Scan on testtable  (cost=0.00..2407.03 rows=2470 width=4)
   Filter: (testint < 50)
(2 rows)


(found on 8.4 and reproduced on git-head)

Attached patch tries to align the behaviour

Thanks.

--
Jesper

>From 4b4be27864f683a9b3464b86d6cfa567f8ab6bd2 Mon Sep 17 00:00:00 2001
From: Jesper Krogh <j...@novozymes.com>
Date: Thu, 17 Feb 2011 22:21:52 +0100
Subject: [PATCH] Take null_frac into account in calculating selectivity for @@ operator. This makes behaviour align with what > operator does for integers.

---
 src/backend/tsearch/ts_selfuncs.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/tsearch/ts_selfuncs.c b/src/backend/tsearch/ts_selfuncs.c
index 8ce9fb4..a4f3219 100644
--- a/src/backend/tsearch/ts_selfuncs.c
+++ b/src/backend/tsearch/ts_selfuncs.c
@@ -180,14 +180,14 @@ tsquerysel(VariableStatData *vardata, Datum constval)
 			 * There is a most-common-elements slot for the tsvector Var, so
 			 * use that.
 			 */
-			selec = mcelem_tsquery_selec(query, values, nvalues,
+			selec = (1.0 - stats->stanullfrac ) * mcelem_tsquery_selec(query, values, nvalues,
 										 numbers, nnumbers);
 			free_attstatsslot(TEXTOID, values, nvalues, numbers, nnumbers);
 		}
 		else
 		{
 			/* No most-common-elements info, so do without */
-			selec = tsquery_opr_selec_no_stats(query);
+			selec = (1.0 - stats->stanullfrac ) * tsquery_opr_selec_no_stats(query);
 		}
 	}
 	else
-- 
1.7.0.4

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Estimates not taking null_frac element into account with @@ operator? (8.4 .. git-head)

Reply via email to