On Tue, 19 Sept 2023 at 23:45, Richard Guo <guofengli...@gmail.com> wrote: > My first thought about the fix is that we artificially add resjunk > target entries to parse->targetList for the ordered aggregates' > arguments that are ORDER BY expressions, as attached. While this can > fix the given query, it would cause Assert failure for the query in > sql/triggers.sql.
> Any thoughts? Unfortunately, we can't do that as it'll lead to target entries existing in the GroupAggregate's target list that have been aggregated. postgres=# explain verbose SELECT a, COUNT(DISTINCT b) FROM rp GROUP BY a; QUERY PLAN -------------------------------------------------------------------------------------- Append (cost=88.17..201.39 rows=400 width=44) -> GroupAggregate (cost=88.17..99.70 rows=200 width=44) Output: rp.a, count(DISTINCT rp.b), rp.b Your patch adds rp.b as an output column of the GroupAggregate. Logically, that column cannot exist there as there is no correct single value of rp.b after aggregation. I think the fix needs to go into create_agg_path(). The problem is that for AGG_SORTED we do: if (aggstrategy == AGG_SORTED) pathnode->path.pathkeys = subpath->pathkeys; /* preserves order */ which assumes that all of the columns before the aggregate will be available after the aggregate. That likely used to work ok before 1349d2790 as the planner wouldn't have requested any Pathkeys for columns that were not available below the Agg node. We can no longer take the subpath pathkey's verbatim. We need to strip off pathkeys for columns that are not in pathnode's targetlist. I've attached a patch which adds a new function to pathkeys.c to strip off any PathKeys in a list that don't have a corresponding item in the given PathTarget and just return a prefix of the input pathkey list up until the first expr that can't be found. I'm concerned that this patch will be too much overhead when creating paths when a PathKey's EquivalenceClass has a large number of members from partitioned tables. I wondered if we should instead just check if the subpath's pathkeys match root->group_pathkeys and if they do set the AggPath's pathkeys to list_copy_head(subpath->pathkeys, root->num_groupby_pathkeys), that'll be much cheaper, but it just feels a bit too much like a special case. David
strip_aggregated_pathkeys_from_aggpaths.patch
Description: Binary data