Robert Haas <robertmh...@gmail.com> writes: > So I don't quite know which way to jump here. It now seems to me that > we have three similar features with three different designs. > Parameterization added non-comparable paths to the same path list; > parallel query added them to a different path list in the same > RelOptInfo; and this patch currently adds them a separate RelOptInfo.
Yeah, this. I don't think that either of those first two decisions was wrong, but it does seem annoying that this patch wants to do it yet a third way. Still, it may be the right thing. Bear with me a moment: We dealt with parameterized paths being in the same list as non-parameterized paths by treating the set of parameter rels as a figure-of-merit that add_path can compare. This works because if, say, a nonparameterized path dominates a parameterized one on every other figure of merit then there's no point in keeping the parameterized one. It is squirrely that the parameterized paths typically don't yield the same number of rows as others for the same RelOptInfo, but at least so far that hasn't broken anything. I think it's important that the parameterized paths do yield the same column set as other paths for the rel; and the rows they do yield will be a subset of the rows that nonparameterized paths yield. On the other hand, it's not sensible for partial paths to compete in an add_path tournament with non-partial ones. If they did, neither group could be allowed to dominate the other group, so add_path would just be wasting its time making those path comparisons. So I do think it was right to put them in a separate path list. Importantly, they generate the same column set and some subset of the same rows that the non-partial ones do, which I think is what justifies putting them into the same RelOptInfo. However, a partial-aggregation path does not generate the same data as an unaggregated path, no matter how fuzzy you are willing to be about the concept. So I'm having a very hard time accepting that it ought to be part of the same RelOptInfo, and thus I don't really buy that annotating paths with a GroupPathInfo is the way forward. What this line of analysis doesn't tell us though is whether paths that did their partial aggregations at different join levels can be considered as enough alike that they should compete on cost terms. If they are, we need to put them into the same RelOptInfo. So while I want to have separate RelOptInfos for partially aggregated paths, I'm unclear on how many of those we need or what their identifying property is. Also: we avoid generating parameterized partial paths, because combining those things would be too much of a mess. There's some handwaving in the comments for add_partial_path to the effect that it wouldn't be a win anyway, but I think the real reason is that it'd be far too complicated for the potential value. Can we make a similar argument for partial aggregation? I sure hope so. > I agree that creating an exponential number of RelOptInfos is not > going to work out well. FWIW, I'm way more concerned about the number of Paths considered than I am about the number of RelOptInfos. This relates to your question about whether we want to use some heuristics to limit the planner's search space. regards, tom lane