On Fri, 29 Nov 2024 at 01:33, jian he <jian.universal...@gmail.com> wrote: > > On Thu, Nov 28, 2024 at 2:11 PM David Rowley <dgrowle...@gmail.com> wrote: > > I think it'll make more sense to adjust some of the > > existing tests to use a unique constraint instead of a PK and then > > adjust a column's NOT NULL property to check that part of the code is > > working correctly. > > > looking around, i inserted some tests to indexing.sql, > create_index.sql, aggregates.sql > also added tests for sort by index oid.
I meant the tests added by d4c3a156c. I don't think there's any need to have tests in more than one file. > > Another issue with this is that the list of indexes being used is not > > sorted by Oid. > When there are multiple matches, we need a determined way to choose which > one. > so please check attached. Looking closer at get_relation_info(), it looks like the RelOptInfo.indexlist will be in descending Oid order, per: /* * We've historically used lcons() here. It'd make more sense to * use lappend(), but that causes the planner to change behavior * in cases where two indexes seem equally attractive. For now, * stick with lcons() --- few tables should have so many indexes * that the O(N^2) behavior of lcons() is really a problem. */ indexinfos = lcons(info, indexinfos); I'm starting to feel like we might be trying to include too much logic to mimic this behaviour. The main problem is that the RelOptInfos have not yet been created for the given relation when remove_useless_groupby_columns() is called. That's why we're having to query the catalogue tables like this. I do see a comment in preprocess_groupclauses() that mentions: * Note: we return a fresh List, but its elements are the same * SortGroupClauses appearing in parse->groupClause. This is important * because later processing may modify the processed_groupClause list. So there's already a warning that there may still be future changes to the processed_groupClause. It'd be nice if we could delay calling remove_useless_groupby_columns() until the RelOptInfos are built as we'd not have to query the catalogue tables to make this work. The soonest point is just after the call to add_base_rels_to_query() in query_planner(). That would mean doing more work on the processed_groupClause in query_planner() rather than in grouping_planner(), which is a little strange. It looks like the first time we look at processed_groupClause and make a decision based on the actual value of it is in standard_qp_callback(), i.e. after add_base_rels_to_query(). I've attached an updated patch that gets rid of the get_unique_not_null_attnos() function completely and uses the RelOptInfo.indexlist and RelOptInfo.notnullattnums. I also realised that there's no need to check for NOT NULL constraints with unique indexes defined with NULLS NOT DISTINCT as having NULLs unique means we can still determine the functional dependency. One other possibly good outcome of this is that it's much easier to do the pg_statistic.avgwidth thing I talked about as RelOptInfo stores those in attr_widths, however, looking at table_block_relation_estimate_size(), I see we might not always populate those. I ended up re-homing remove_useless_groupby_columns() in initsplan.c. I couldn't see anywhere else it fitted. David
v8-0001-remove-useless-group-by-columns-via-unique-not-nu.patch
Description: Binary data