Hi, > On Jun 24, 2026, at 16:24, Anthonin Bonnefoy > <[email protected]> wrote: > > Hi, > > Currently, the row estimation of a query matching a multi-column > unique index only relies on combining columns' selectivity. This leads > to a row estimation >1 despite matching the unique index, and possibly > avoiding using said index. > > This issue is visible with the following queries: > > CREATE TABLE visitor (id BIGINT GENERATED ALWAYS AS IDENTITY, org INT, > name TEXT); > CREATE UNIQUE INDEX ON visitor(org, name); > INSERT INTO visitor(org, name) SELECT *, 'Support' FROM > generate_series(1, 500000); > INSERT INTO visitor(org, name) SELECT 500000, 'User_' || i FROM > generate_series(1, 2000) as g(i); > -- Test with only multi column index > ANALYZE visitor; > EXPLAIN SELECT * FROM visitor WHERE visitor.org = 500000 AND > visitor.name = 'Support' ORDER BY visitor.id LIMIT 1; > -- Test with index on id > CREATE UNIQUE INDEX ON visitor(id); > EXPLAIN SELECT * FROM visitor WHERE visitor.org = 500000 AND > visitor.name = 'Support' ORDER BY visitor.id LIMIT 1; > > Which gives the following plans: > QUERY PLAN > ------------------------------------------------------------------------------------------------------- > Limit (cost=2861.98..2861.98 rows=1 width=20) > -> Sort (cost=2861.98..2867.35 rows=2149 width=20) > Sort Key: id > -> Index Scan using visitor_org_name_idx on visitor > (cost=0.42..2851.24 rows=2149 width=20) > Index Cond: ((org = 500000) AND (name = 'Support'::text)) > > QUERY PLAN > -------------------------------------------------------------------------------------------- > Limit (cost=0.42..9.15 rows=1 width=20) > -> Index Scan using visitor_id_idx on visitor > (cost=0.42..18753.42 rows=2149 width=20) > Filter: ((org = 500000) AND (name = 'Support'::text)) > > Despite matching the unique index, the planner estimates 2149 rows. > Adding an index on id, the planner sees the new index as cheaper and > switches to it. > > There's already an unique index matching done in btcostestimate, but > it only adjusts numIndexTuples to 1. The attached patch improves this > behavior by: > - Adjusting the index's selectivity to match the numIndexTuples=1 value > - cost->indexSelectivity is used to propagate this new selectivity, > which is then used in genericcostestimate, similar to what's done with > numIndexTuples > - Index's path row is also adjusted in cost_index if index selectivity > is better than the initial row estimation > > With those changes, the query plan becomes: > Limit (cost=8.45..8.46 rows=1 width=20) > -> Sort (cost=8.45..8.46 rows=1 width=20) > Sort Key: id > -> Index Scan using visitor_org_name_idx on visitor > (cost=0.42..8.44 rows=1 width=20) > Index Cond: ((org = 500000) AND (name = 'Support'::text)) > > Regards, > Anthonin Bonnefoy > <v1-0001-Adjust-row-estimation-and-index-selectivity-with-.patch>
Thanks for working on this. I agree this is a real planner estimation problem: when equality clauses constrain all key columns of a multi-column unique index, the planner has a much stronger upper bound than what independent per-column selectivity can provide. I think the v1 patch identifies the right fact, but applies it at the wrong level. The optimizer README expects paths for the same relation and parameterization to have the same rowcount estimate. v1 instead lowers only the B-tree `IndexPath`'s `Path.rows`, using `indexSelectivity` after the AM cost hook has run. That makes the one-row estimate depend on choosing that access path. A full unique equality match is a relation-level one-row proof; it should update the relation row estimate, not use `indexSelectivity` as a rowcount channel. I put together a v2 that keeps those concerns separate: - row estimates are clamped where they are produced: `set_baserel_size_estimates()` for base restrictions, and `get_parameterized_baserel_size()` when outer-parameter clauses are needed; - B-tree costing still caps `indexSelectivity` for the matching lookup, so heap fetch costing also sees the one-tuple bound; - the proof reuses the existing `relation_has_unique_index_for()` machinery, via a small wrapper for base and parameterized clauses. I also added regression coverage for the positive base and parameterized cases, a missing-key case that only constrains part of a multi-column unique key, and the two dangerous negative cases: `IS NULL` with a regular unique index, and deferrable unique constraints. The latter two must not be treated as one-row proofs. I also made the B-tree unique lookup shortcut require immediate uniqueness for the same reason. I attached the patch. make check passed here. I'd be interested in your thoughts on whether this split addresses the issue you were targeting in v1. -- Best regards, Chengpeng Yan
v2-0001-Improve-estimates-for-multicolumn-unique-indexes.patch
Description: v2-0001-Improve-estimates-for-multicolumn-unique-indexes.patch
