Andrey Lepikhov <a.lepik...@postgrespro.ru> writes: > On 12/8/21 04:26, Tomas Vondra wrote: >> I wonder if we should teach clauselist_selectivity about UNIQUE indexes, >> and improve the cardinality estimates directly, not just costing for >> index scans.
> I tried to implement this in different ways. But it causes additional > overhead and code complexity - analyzing a list of indexes and match > clauses of each index with input clauses in each selectivity estimation. > I don't like that way and propose a new patch in attachment. I looked at this briefly. I do not think that messing with btcostestimate/genericcostestimate is the right response at all. The problem can be demonstrated with no index whatever, as in the attached shortened version of the original example. I get QUERY PLAN --------------------------------------------------- Seq Scan on a (cost=0.00..46.02 rows=1 width=12) Filter: ((x = 1) AND (y = 1) AND (z = 1)) (2 rows) before adding the extended stats, and QUERY PLAN ---------------------------------------------------- Seq Scan on a (cost=0.00..46.02 rows=28 width=12) Filter: ((x = 1) AND (y = 1) AND (z = 1)) (2 rows) afterwards. So the extended stats have made the rowcount estimate significantly worse, which seems like an indicator of a bug somewhere in extended stats. The more so because I can crank default_statistics_target all the way to 10000 without these estimates changing. If we can't get a dead-on estimate for a 2001-row table at that stats level, we're doing something wrong, surely? Also, I found that if I ask only for ndistinct stats, I still get rows=1. The fishiness seems to be directly a problem with dependencies stats. regards, tom lane
DROP TABLE IF EXISTS a CASCADE; DROP STATISTICS IF EXISTS aestat; CREATE TABLE a AS ( SELECT gs % 10 AS x, (gs % 10 + (gs/10::int4) % 10) % 10 AS y, (gs / 100)::int4 AS z FROM generate_series(1,1000) AS gs ); INSERT INTO a (SELECT gs,gs,gs FROM generate_series(1000,2000) AS gs); -- ALTER TABLE a ADD PRIMARY KEY (x,y,z); -- CREATE INDEX ON a(x); ANALYZE a; EXPLAIN SELECT * FROM a WHERE x=1 AND y=1 AND z=1; CREATE STATISTICS aestat(dependencies,ndistinct) ON x,y,z FROM a; ANALYZE a; EXPLAIN SELECT * FROM a WHERE x=1 AND y=1 AND z=1;