Hi, When looking at [1], I realized we may have a side effect when removing redundant columns in the GROUP BY clause. Suppose we have a query with ORDER BY 'b', and meanwhile column 'b' is also a group key. If we decide that 'b' is redundant due to being functionally dependent on other GROUP BY columns, we would remove it from group keys. This will make us lose the opportunity to leverage the index on 'b'.
Here is an example for illustration. # create table t (a int primary key, b int); # insert into t select i, i%1000 from generate_series(1,1000000)i; # create index on t(b); By default, we remove 'b' from group keys and generate a plan as below: # explain (costs off) select b from t group by a, b order by b limit 10; QUERY PLAN ------------------------------------------------ Limit -> Sort Sort Key: b -> Group Group Key: a -> Index Scan using t_pkey on t (6 rows) The index on 'b' is not being used and we'll have to retrieve all the data underneath to perform the sort work. On the other hand, if we keep 'b' as a group column, we can get such a plan as: # explain (costs off) select b from t group by a, b order by b limit 10; QUERY PLAN ------------------------------------------------- Limit -> Group Group Key: b, a -> Incremental Sort Sort Key: b, a Presorted Key: b -> Index Scan using t_b_idx on t (7 rows) With the help of 't_b_idx', we can avoid the full scan on 't' and it would run much faster. Any thoughts? [1] https://www.postgresql.org/message-id/flat/16869-26346b77d6ccaeec%40postgresql.org Thanks Richard