Hi All.
The NULL element always suprises me in unpleasant ways.. my brain simply
cant really understand the logic, so please let me know if this is
one of the cases where I just should spend way more efforts into fixing
that instead.
I have a table with a "null_frac" of 0.5 and i have tested that a where
clause that evaluates to null isnt included in the result:
testdb=# select id from testtable where null @@ to_tsquery('testterm80');
id
----
(0 rows)
Then I'd expect to have the null_fraq taken into account when computing the
estimates for the query:
testdb=# explain select id from testtable where fts @@
to_tsquery('testterm80');
QUERY PLAN
---------------------------------------------------------------
Seq Scan on testtable (cost=0.00..1985.03 rows=1966 width=4)
Filter: (fts @@ to_tsquery('testterm80'::text))
(2 rows)
Whereas it actually does it if I explicitly add the "fts is not null"
clause to the query.
testdb=# explain select id from testtable where fts @@
to_tsquery('testterm80') and fts is not null;
QUERY PLAN
-------------------------------------------------------------------------------------
Bitmap Heap Scan on testtable (cost=130.34..1735.19 rows=983 width=4)
Recheck Cond: ((fts @@ to_tsquery('testterm80'::text)) AND (fts IS
NOT NULL))
-> Bitmap Index Scan on testtable_fts_idx (cost=0.00..130.09
rows=983 width=0)
Index Cond: ((fts @@ to_tsquery('testterm80'::text)) AND (fts
IS NOT NULL))
(4 rows)
When something evaluates to "null" isn't included in the result,
shouldn't the query-planner
then take the null_frac into account when computing the estimate?
Trying to do the same thing using integers and the < operator seem to
take the null_frac into
account.
Below snippet allows to reproduce the dataset.
create table testtable (id serial primary key, document text, fts tsvector);
create index on testtable using gist(fts);
CREATE OR REPLACE FUNCTION public.filltable(rows integer)
RETURNS boolean
LANGUAGE plpgsql
AS $function$
DECLARE
count integer;
BEGIN
count := 0;
LOOP
EXIT WHEN count = rows;
count := count +1;
insert into testtable(document,fts) select
document,to_tsvector('english',document) from (select
string_agg(concat,' ') as document from (select concat('testterm' ||
generate_series(1,floor(random()*100)::integer))) as foo) as bar;
END LOOP;
RETURN TRUE;
END;
$function$
select filltable(10000);
testdb=# update testtable set fts = null where id % 2 = 0;
UPDATE 5001
testdb=# ANALYZE verbose testtable;
INFO: analyzing "public.testtable"
INFO: "testtable": scanned 1835 of 1835 pages, containing 10002 live
rows and 5001 dead rows; 10002 rows in sample, 10002 estimated total rows
ANALYZE
testdb=# select null_frac from pg_stats where attname = 'fts';
null_frac
-----------
0.5
(1 row)
... trying with integers:
testdb=# ALTER TABLE testtable add column testint integer;
ALTER TABLE
testdb=# update testtable set testint = floor(random()*100);
UPDATE 10002
testdb=# ANALYZE verbose testtable;
INFO: analyzing "public.testtable"
INFO: "testtable": scanned 2186 of 2186 pages, containing 10002 live
rows and 10002 dead rows; 10002 rows in sample, 10002 estimated total rows
ANALYZE
testdb=# update testtable set testint = null where id %2 = 0;
UPDATE 5001
testdb=# ANALYZE verbose testtable;
INFO: analyzing "public.testtable"
INFO: "testtable": scanned 2282 of 2282 pages, containing 10002 live
rows and 13335 dead rows; 10002 rows in sample, 10002 estimated total rows
analyzze ANALYZE
testdb=# explain select id from testtable where testint = 50;
QUERY PLAN
-------------------------------------------------------------
Seq Scan on testtable (cost=0.00..2407.03 rows=64 width=4)
Filter: (testint = 50)
(2 rows)
testdb=# explain select id from testtable where testint = 1;
QUERY PLAN
-------------------------------------------------------------
Seq Scan on testtable (cost=0.00..2407.03 rows=48 width=4)
Filter: (testint = 1)
(2 rows)
testdb=# explain select id from testtable where testint < 50;
QUERY PLAN
---------------------------------------------------------------
Seq Scan on testtable (cost=0.00..2407.03 rows=2470 width=4)
Filter: (testint < 50)
(2 rows)
(found on 8.4 and reproduced on git-head)
Attached patch tries to align the behaviour
Thanks.
--
Jesper
>From 4b4be27864f683a9b3464b86d6cfa567f8ab6bd2 Mon Sep 17 00:00:00 2001
From: Jesper Krogh <j...@novozymes.com>
Date: Thu, 17 Feb 2011 22:21:52 +0100
Subject: [PATCH] Take null_frac into account in calculating selectivity for @@ operator. This makes behaviour align with what > operator does for integers.
---
src/backend/tsearch/ts_selfuncs.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/backend/tsearch/ts_selfuncs.c b/src/backend/tsearch/ts_selfuncs.c
index 8ce9fb4..a4f3219 100644
--- a/src/backend/tsearch/ts_selfuncs.c
+++ b/src/backend/tsearch/ts_selfuncs.c
@@ -180,14 +180,14 @@ tsquerysel(VariableStatData *vardata, Datum constval)
* There is a most-common-elements slot for the tsvector Var, so
* use that.
*/
- selec = mcelem_tsquery_selec(query, values, nvalues,
+ selec = (1.0 - stats->stanullfrac ) * mcelem_tsquery_selec(query, values, nvalues,
numbers, nnumbers);
free_attstatsslot(TEXTOID, values, nvalues, numbers, nnumbers);
}
else
{
/* No most-common-elements info, so do without */
- selec = tsquery_opr_selec_no_stats(query);
+ selec = (1.0 - stats->stanullfrac ) * tsquery_opr_selec_no_stats(query);
}
}
else
--
1.7.0.4
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers