On Fri, Feb 2, 2024 at 11:26 AM David Rowley <dgrowle...@gmail.com> wrote:
> In light of this, do you still think it's worthwhile making this change? > > For me, I think all it's going to result in is extra planner work > without any performance gains. Hmm, with the query below, I can see that the new plan is cheaper than the old plan, and the cost difference exceeds STD_FUZZ_FACTOR. create table t (a int, b int); insert into t select i%100000, i from generate_series(1,10000000)i; analyze t; -- on master explain (costs on) select distinct a from t order by a limit 1; QUERY PLAN -------------------------------------------------------------------------------------------------- Limit (cost=120188.50..120188.51 rows=1 width=4) -> Sort (cost=120188.50..120436.95 rows=99379 width=4) Sort Key: a -> HashAggregate (cost=118697.82..119691.61 rows=99379 width=4) Group Key: a -> Gather (cost=97331.33..118200.92 rows=198758 width=4) Workers Planned: 2 -> HashAggregate (cost=96331.33..97325.12 rows=99379 width=4) Group Key: a -> Parallel Seq Scan on t (cost=0.00..85914.67 rows=4166667 width=4) (10 rows) -- on patched explain (costs on) select distinct a from t order by a limit 1; QUERY PLAN -------------------------------------------------------------------------------------------------- Limit (cost=106573.93..106574.17 rows=1 width=4) -> Unique (cost=106573.93..130260.88 rows=99379 width=4) -> Gather Merge (cost=106573.93..129763.98 rows=198758 width=4) Workers Planned: 2 -> Sort (cost=105573.91..105822.35 rows=99379 width=4) Sort Key: a -> HashAggregate (cost=96331.33..97325.12 rows=99379 width=4) Group Key: a -> Parallel Seq Scan on t (cost=0.00..85914.67 rows=4166667 width=4) (9 rows) It seems that including a LIMIT clause can potentially favor the new plan. Thanks Richard