On Tue, Sep 24, 2024 at 11:20 PM Richard Guo <guofengli...@gmail.com> wrote: > The reason is that it is very tricky to set the size estimates for a > grouped join relation. For a non-grouped join relation, we know that > all its paths have the same rowcount estimate (well, in theory). But > this is not true for a grouped join relation. Suppose we have a > grouped join relation for t1/t2 join. There might be two paths for > it:
What exactly do you mean by "well, in theory" here? My understanding of how things work today is that every relation is supposed to produce a specific set of rows and every unparameterized path must produce that set of rows. The order of the rows may vary but the set of rows may not. With your proposed design here, that's no longer true. Instead, what's promised is that the row sets will become equivalent after a later FinalizeAggregate step. In a sense, this is like parameterization or partial paths. Suppose I have: SELECT * FROM foo, bar WHERE foo.x = bar.x; While every unparameterized path for bar has the same row count, there's also the possibility of performing an index scan on bar.x parameterized by foo.x, and that path will have a far lower row count than the unparameterized paths. Instead of producing all the same rows as every other path, the parameterized path promises only that if run repeatedly, with all relevant values of foo.x, you'll eventually get all the same rows you would have gotten from the unparameterized path. Because of this difference, parameterized paths need special handling in many different parts of the code. And the same thing is true of partial paths. They also do not promise to generate all the same rows -- instead, they promise that when run simultaneously across multiple workers, the total set of rows returned across all invocations will be equal to what a normal path would have produced. Here again, there's a need for special handling because these paths behave differently than standard paths. I think what you're doing here is roughly equivalent to either of these two cases. It's more like the parameterized path case. Instead of having a path for a relation which is parameterized by some input parameter, you have a path for a relation, say bar, which is partially aggregated by some grouping column. But there's no guarantee of how much partial aggregation has been done. In your example, PartialAgg(t1 JOIN t2) is "more aggregated" than t1 JOIN PartialAgg(t2), so the row counts are different. This makes me quite nervous. You can't compare a parameterized path to an unparameterized path, but you can compare it to another parameterized path if the parameterizations are the same. You can't compare a partial path to a non-partial path, but you can compare partial paths to each other. But with this work, unparameterized, non-partial paths in the same RelOptInfo don't seem like they are truly comparable. Maybe that's OK, but I'm not sure that it isn't going to break other things. You might for example imagine a design where PartialAgg(t1 JOIN t2) and t1 JOIN PartialAgg(t2) get separate RelOptInfos. After all, there are probably multiple ways to generate paths for each of those things, and paths in each category can be compared to each other apples to apples. What's less clear is whether it's fair to compare across the two categories, and how many assumptions will be broken by doing so. I'm not sure that it's right to have separate RelOptInfos; we definitely don't want to create more RelOptInfos than necessary. At the same time, if we mix together all of those paths into a single RelOptInfo, we need to be confident that we're neither going to break anything nor introduce too many special cases into hot code paths. For instance, set_joinpath_size() represents an unwelcome complexity increase that could impact performance generally, even apart from the cases this patch intends to handle. It's tempting to wonder if there's some way that we can avoid generating paths for both PartialAgg(t1 JOIN t2) and t1 JOIN PartialAgg(t2). Either the former has lower cardinality, or the latter does. It seems likely that the lower-cardinality set is the winning strategy. Even if the path has higher cost to generate, we save work at every subsequent join level and at the final aggregation step. Are there counterexamples where it's better to use a path from the higher-cardinality set? By the way, the work of figuring out what target list should be produced by partial grouping is done by init_grouping_targets(), but the comments seem to take it for granted that I know what result we're trying to produce, and I don't. I think some more high-level explanation of the goals of this code would be useful. It seems to me that if I'm looking at a path for an ungrouped relation and it produces a certain target list, then every column of that target list is needed somewhere. If those columns are group keys, cool: we pass those through. If they're inputs to the aggregates, cool: we feed them to the aggregates. But if they are neither, then what? In the patch, you either group on those columns or add them to the possibly_dependent list depending on the result of is_var_needed_by_join(). I can believe that there are some cases where we can group on such columns and others where we can't, but find it difficult to believe that this test reliably distinguishes between those two cases. If it does, I don't understand why it does. Don't I need to know something about how those columns are used in the upper joins? Like, if those columns are connected by a chain of binary-equality operators back to the user's choice of grouping columns, that sounds good, but this test doesn't distinguish between that case and an upper join on the < operator. create_grouping_expr_infos() does reason based on whether there's an equal-image operator available, but AIUI that's only reasoning about the group columns the user mentioned, not the sort of implicit grouping columns that init_grouping_targets() is creating. I spent a lot of time thinking today about what makes it safe to push down grouping or not. I'm not sure that I have a solid answer to that question even yet. But it seems to me that there are at least two cases that clearly won't fly. One is the case where the partial aggregation would merge rows that need to be included in the final aggregation with rows that should be filtered out. If the partially-grouped relation simply has a filter qual, that's fine, because it will be evaluated before the aggregation. But there might be a qual that has to be evaluated later, either because (a) it involves another rel, like this_rel.x + that_rel.y > 10 or (b) it appears in the ON clause of an outer join and thus needs to be deferred to the level of the OJ, e.g. A LEFT JOIN B ON a.x = b.x AND b.y = 42. I wonder if you can comment on how these cases are handled. Perhaps this coding around functional dependencies has something to do with it, but it isn't clear to me. Thanks, -- Robert Haas EDB: http://www.enterprisedb.com