On Tue, Jan 21, 2025 at 3:33 AM Richard Guo <guofengli...@gmail.com> wrote: > I've been thinking about this proposal, and it's quite appealing. It > would significantly reduce both the planning effort and implementation > complexity, while still yielding reasonable planning results. > > One concern I have with this proposal is that, as we climb up higher > and higher in the join tree, the assumption that a path with smaller > row count and higher cost is better than one with larger row count and > lower cost may gradually no longer hold. It's true that a path with a > smaller row count is generally better for upper join nodes, as it > feeds fewer rows to upper join nodes. However, as there are fewer and > fewer upper join nodes left, the efficiency gained from the smaller > row count could likely no longer justify the high cost of that path > itself. > > Here's an example I found that can help illustrate what I mean.
Thanks for the example. What seems to be happening here is that each of the three joins increases the number of rows by a multiple of either 166 or 333. Aggregating reduces the number of rows to 3. I am not sure that we should be too concerned about this kind of case, because I don't think it will be common to have multiple joins that dramatically increase the row count. If you did have that, you must want to aggregate multiple times. We don't have the code for an IntermediateAggregate or CombineAggregate node right now, I believe, but in this query it would likely make sense to apply such a step after every join; then you'd never have more than three rows. Honestly, I'm not sure how much we should worry about a case like this. I think that if a user is writing queries that use joins to vastly inflate the row count and then aggregate the result, perhaps they need to think about rewriting the queries. In this instance, it feels a bit like the user is emulating multiplication using an iterated SUM(), which is probably never going to work out all that well. But I bet it's possible to construct an example using only row-reducing joins. Let's say we start with 10k rows that aggregate to 10 rows; after performing a join, we end up with 9k rows that aggregate to 9 rows. So if we partially aggregate first, we have to aggregate 1000 extra rows, but if we join first, we have to join 1000 extra rows. I don't think we can say a priori which will be cheaper, but my idea would make the path that partially aggregates after the join win unconditionally. > Yeah, you're right that the join search process for grouped paths > basically mirrors what we do for non-grouped paths, which indeed > involves a lot of planner effort. I've been exploring potential > heuristics to limit the search space for grouped paths, but so far, I > haven't found any effective solutions. Currently, the heuristic used > in the patch is to only consider grouped paths that dramatically > reduce the number of rows. All others are just discarded. The > rationale is that if a grouped path does not reduce the number of rows > enough, it is highly unlikely to result in a competitive final plan > during the upper planning stages, so it doesn't make much sense to > consider it. The current threshold is set to 50%, meaning that if the > number of rows returned by PartialAgg(t1 JOIN t2) is not less than 50% > of the rows returned by (t1 JOIN t2), no Aggregate paths will be > generated on top of the t1/t2 join. If we notice significant > regressions in planning time, we might consider further increasing > this threshold, say, to 80%, so that only grouped paths that reduce > the rows by more than 80% will be considered. This heuristic also > ensures that, once a plan with eager aggregation is chosen, it is > highly likely to result in performance improvements, due to the > significant data reduction before joins. To be honest, I was quite surprised this was a percentage like 50% or 80% and not a multiple like 2 or 5. And I had thought the multiplier might even be larger, like 10 or more. The thing is, 50% means we only have to form 2-item groups in order to justify aggregating twice. Maybe SUM() is cheap enough to justify that treatment, but a more expensive aggregate might not be, especially things like string_agg() or array_agg() where aggregation creates bigger objects. Another thing to consider is that when the number of groups is small enough that we don't need to do a Sort+GroupAggregate, it doesn't seem so bad to perform marginally-useful partial aggregation, but sometimes that won't be the case. For example, imagine that the user wants to join orders to order_lines and then compute SUM(order_lines.quantity) for each orders.customer_id. If the size of the order_lines tables is large relative to work_mem, we're going to need to sort it in order to partially aggregate, which is expensive. If it turns out that the orders table is also quite big, then maybe we'll end up performing a merge join and the same sort order can be used for both operations, but if not, we could've just done a hash join with orders as the build table. In that kind of case, partial aggregation has to save quite a lot to justify itself. Now, maybe we shouldn't worry about that when applying this heuristic cutoff; after all, it's the job of the cost model to understand that sorting is expensive, and this cutoff should just be there to make sure we don't even try the cost model in cases where it's clearly unpromising. But I do suspect that in queries where the average group size is 2, this will often be a marginal technique. In addition to the problems already mentioned, it could be that the average group size is 2 but a lot of groups are actually of size 1 and then there are some larger groups. In such cases I'm even less sure that the partial aggregation technique will be a winner. Building many 1-element groups sounds inefficient. -- Robert Haas EDB: http://www.enterprisedb.com