Hi, I've been pulling over Tom's occasional remarks about redoing grouping_planner - and maybe further layers of the planner - to work with Paths instead of Plans. I've had difficulty locating all of the relevant threads. Here's everything I've found so far, which I'm pretty sure is not everything:
http://www.postgresql.org/message-id/17400.1311716...@sss.pgh.pa.us http://www.postgresql.org/message-id/2614.1375730...@sss.pgh.pa.us http://www.postgresql.org/message-id/22721.1385048...@sss.pgh.pa.us http://www.postgresql.org/message-id/banlktindjjfhnozesg2j2u4gokqlu69...@mail.gmail.com http://www.postgresql.org/message-id/8479.1418420...@sss.pgh.pa.us I think there are two separate problems here. First, there's the problem that grouping_planner() is complicated. It's doing cost comparisons, but all in ad-hoc fashion rather than using a consistent mechanic the way add_path() does. Generally, we can plan an aggregate using either (1) a hashed aggregate, (2) a sorted aggregate, or (3) for min or max, an index scan that just grabs the highest or lowest value in lieu of a full table scan. Instead of generating a plan for each of these possibilities, we'd like to generate paths for each one, and then pick one to turn into a plan. AIUI, the hope is that this would simplify the cost calculations, and also make it easier to inject other paths, such as a path where an FDW performs the aggregation step on the remote server. Second, there's the problem that we might like to order aggregates with respect to joins. If we have something like SELECT DISTINCT ON (foo.x) foo.x, foo.y, bar.y FROM foo, bar WHERE foo.x = bar.x, then (a) if foo.x has many duplicates, it will be better to DISTINCT-ify foo and then join to bar afterwards but (b) if foo.x = bar.x is highly selective, it will be better to join to bar first and then DISTINCT-ify the result. Currently, aggregation is always last; that's not great. Hitoshi Harada's proposed strategy of essentially figuring out where the aggregation steps can go and then re-planning for each one is also not great, because each join problem will be a subtree of the one we use for the aggregate-last strategy, and thus we're wasting effort by planning the same subtrees multiple times. Instead, we might imagine letting grouping planner fish out the best paths for the joinrels that represent possible aggregation points, generate aggregation paths for each of those, and then work out what additional rels need to be joined afterwards. That sounds hard, but not as hard as doing something sensible with what we have today. I'm inclined to think that it would be useful to solve the first problem even if we didn't solve the second one right away (but that might be wrong). As a preparatory step, I'm thinking it would be sensible to split grouping_planner() into an outer function that would handle the addition of Limit and LockRows nodes and maybe planning of set operations, and an inner function that would handle GROUP BY, DISTINCT, and possibly window function planning. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers