On Thu, May 14, 2015 at 7:09 AM, Robert Haas <robertmh...@gmail.com> wrote:
> Hi, > > I've been pulling over Tom's occasional remarks about redoing > grouping_planner - and maybe further layers of the planner - to work > with Paths instead of Plans. I've had difficulty locating all of the > relevant threads. Here's everything I've found so far, which I'm > pretty sure is not everything: > > http://www.postgresql.org/message-id/17400.1311716...@sss.pgh.pa.us > http://www.postgresql.org/message-id/2614.1375730...@sss.pgh.pa.us > http://www.postgresql.org/message-id/22721.1385048...@sss.pgh.pa.us > > http://www.postgresql.org/message-id/banlktindjjfhnozesg2j2u4gokqlu69...@mail.gmail.com > http://www.postgresql.org/message-id/8479.1418420...@sss.pgh.pa.us > > I think there are two separate problems here. First, there's the > problem that grouping_planner() is complicated. It's doing cost > comparisons, but all in ad-hoc fashion rather than using a consistent > mechanic the way add_path() does. Generally, we can plan an aggregate > using either (1) a hashed aggregate, (2) a sorted aggregate, or (3) > for min or max, an index scan that just grabs the highest or lowest > value in lieu of a full table scan. Instead of generating a plan for > each of these possibilities, we'd like to generate paths for each one, > and then pick one to turn into a plan. AIUI, the hope is that this > would simplify the cost calculations, and also make it easier to > inject other paths, such as a path where an FDW performs the > aggregation step on the remote server. > > Second, there's the problem that we might like to order aggregates > with respect to joins. If we have something like SELECT DISTINCT ON > (foo.x) foo.x, foo.y, bar.y FROM foo, bar WHERE foo.x = bar.x, then > (a) if foo.x has many duplicates, it will be better to DISTINCT-ify > foo and then join to bar afterwards but (b) if foo.x = bar.x is highly > selective, it will be better to join to bar first and then > DISTINCT-ify the result. Currently, aggregation is always last; > that's not great. Hitoshi Harada's proposed strategy of essentially > figuring out where the aggregation steps can go and then re-planning > for each one is also not great, because each join problem will be a > subtree of the one we use for the aggregate-last strategy, and thus > we're wasting effort by planning the same subtrees multiple times. > Instead, we might imagine letting grouping planner fish out the best > paths for the joinrels that represent possible aggregation points, > generate aggregation paths for each of those, and then work out what > additional rels need to be joined afterwards. That sounds hard, but > not as hard as doing something sensible with what we have today. > > Another futuristic avenue for optimization would be commuting Append and Join when joining two similarly partitioned tables. > I'm inclined to think that it would be useful to solve the first > problem even if we didn't solve the second one right away (but that > might be wrong). As a preparatory step, I'm thinking it would be > sensible to split grouping_planner() into an outer function that would > handle the addition of Limit and LockRows nodes and maybe planning of > set operations, and an inner function that would handle GROUP BY, > DISTINCT, and possibly window function planning. > > Thoughts? > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company