> -----Original Message----- > From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tom Lane > Sent: Monday, May 18, 2015 1:12 AM > To: Robert Haas > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] upper planner path-ification > > Robert Haas <robertmh...@gmail.com> writes: > > So, getting back to this part, what's the value of returning a list of > > Paths rather than a list of Plans? > > (1) less work, since we don't have to fill in details not needed for > costing purposes; > (2) paths carry info that the planner wants but the executor doesn't, > notably sort-order annotations. > > > target lists are normally computed when paths are converted to plans, > > but for the higher-level plan nodes adding by grouping_planner, the > > path list is typically just passed in. So would the new path types be > > expected to carry target lists of their own, or would they need to > > figure out the target list on the fly at plan generation time? > > Yeah, that is something I've been struggling with while thinking about > this. I don't really want to add tlists as such to Paths, but it's > not very clear how else to annotate a Path as to what it computes, > and that seems like an annotation we have to have in some form in order > to convert these planning steps into a Path universe. > > There are other cases where it would be useful to have some notion of > this kind. An example is that right now, if you have an expression index > on an expensive function and a query that wants the value of that function, > the planner is able to extract the value from the index --- but there is > nothing that gives any cost benefit to doing so, so it's just as likely > to choose some other index and eat the cost of recalculating the function. > It seems like the only way to fix that in a principled fashion is to have > some concept that the index-scan Path can produce the function value, > and then when we come to some appropriate costing step, penalize the other > paths for having to compute the value that's available for free from this > one. > > Rather than adding tlists per se to Paths, I've been vaguely toying with > a notion of identifying all the "interesting" subexpressions in a query > (expensive functions, aggregates, etc), giving them indexes 1..n, and then > marking Paths with bitmapsets showing which interesting subexpressions > they can produce values for. This would make things like "does this Path > compute all the needed aggregates" much cheaper to deal with than a raw > tlist representation would do. But maybe that's still not the best way. > Hmm.... it seems to me a little bit complicated than flat expression node.
> Another point is that a Path that computes aggregates is fundamentally > different from a Path that doesn't, because it doesn't even produce the > same number of rows. So I'm not at all sure how to visualize the idea > of a Path that computes only some aggregates, or whether it's even a > sensible thing to worry about supporting. > I expect partial aggregate shall be done per a particular input stream, not per aggregate function. In other words, once planner determined a relation scan/join has advantage to run partial aggregate, all the aggregate functions that consume rows produced by this scan/join have to have partial aggregate / final aggregate form, doesn't it? If so, number of rows to be returned is associated with a Path. For example, when we break down a query below using 2-phase aggregation, SELECT t1.cat, avg(t2.x) FROM t1 JOIN t2 ON t1.id_1 = t2.id_2 GROUP BY t1.cat; expected plan is as shown below, isn't it? FinalAgg (nrows=100) tlist: t1.cat, avg(nrows, sum(t2.x)) grouping key: t1.cat -> HashJoin (nrows=1000) tlist: t1.cat, count(t2.x) nrows, sum(t2.x) -> PartialAgg (nrows=1000) tlist: count(t2.x) nrows, sum(t2.x), t2.id_2 grouping key: t2.id_2 -> SeqScan on t2 (nrows=100000000) -> Hash -> SeqScan on t1 (nrows=100) It is clear that PartialAgg consumes 100000000 rows, then output 1000 rows because of partial reduction. All the partial aggregation on this node will work in a coordinated manner. Do you have another vision for the partial aggregation behavior? > > One thing that seems like it might complicate things here is that a > > lot of planner functions take PlannerInfo *root as an argument. But > > if we generate only paths in grouping_planner() and path-ify them > > later, the subquery's root will not be available when we're trying to > > do the Path -> Plan transformation. > > Ah, you're wrong there, because we hang onto the subquery's root already > (I forget why exactly, but see PlannerGlobal.subroots for SubPlans, and > RelOptInfo.subroot for subquery-in-FROM). So it would not be a > fundamental problem to postpone create_plan() for a subquery. > > > I think grouping_planner() is badly in need of some refactoring just > > to make it shorter. It's over 1000 lines of code, which IMHO is a > > fairly ridiculous length for a single function. > > Amen to that. But as I said to Andrew, I think this will be a side-effect > of path-ification in this area, and is probably not something to set out > to do first. > -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kai...@ak.jp.nec.com> -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers