On Thu, May 14, 2015 at 7:09 AM, Robert Haas <robertmh...@gmail.com> wrote:

> Hi,
>
> I've been pulling over Tom's occasional remarks about redoing
> grouping_planner - and maybe further layers of the planner - to work
> with Paths instead of Plans.  I've had difficulty locating all of the
> relevant threads.  Here's everything I've found so far, which I'm
> pretty sure is not everything:
>
> http://www.postgresql.org/message-id/17400.1311716...@sss.pgh.pa.us
> http://www.postgresql.org/message-id/2614.1375730...@sss.pgh.pa.us
> http://www.postgresql.org/message-id/22721.1385048...@sss.pgh.pa.us
>
> http://www.postgresql.org/message-id/banlktindjjfhnozesg2j2u4gokqlu69...@mail.gmail.com
> http://www.postgresql.org/message-id/8479.1418420...@sss.pgh.pa.us
>
> I think there are two separate problems here.  First, there's the
> problem that grouping_planner() is complicated.  It's doing cost
> comparisons, but all in ad-hoc fashion rather than using a consistent
> mechanic the way add_path() does.  Generally, we can plan an aggregate
> using either (1) a hashed aggregate, (2) a sorted aggregate, or (3)
> for min or max, an index scan that just grabs the highest or lowest
> value in lieu of a full table scan.  Instead of generating a plan for
> each of these possibilities, we'd like to generate paths for each one,
> and then pick one to turn into a plan.  AIUI, the hope is that this
> would simplify the cost calculations, and also make it easier to
> inject other paths, such as a path where an FDW performs the
> aggregation step on the remote server.
>
> Second, there's the problem that we might like to order aggregates
> with respect to joins.  If we have something like SELECT DISTINCT ON
> (foo.x) foo.x, foo.y, bar.y FROM foo, bar WHERE foo.x = bar.x, then
> (a) if foo.x has many duplicates, it will be better to DISTINCT-ify
> foo and then join to bar afterwards but (b) if foo.x = bar.x is highly
> selective, it will be better to join to bar first and then
> DISTINCT-ify the result.  Currently, aggregation is always last;
> that's not great.  Hitoshi Harada's proposed strategy of essentially
> figuring out where the aggregation steps can go and then re-planning
> for each one is also not great, because each join problem will be a
> subtree of the one we use for the aggregate-last strategy, and thus
> we're wasting effort by planning the same subtrees multiple times.
> Instead, we might imagine letting grouping planner fish out the best
> paths for the joinrels that represent possible aggregation points,
> generate aggregation paths for each of those, and then work out what
> additional rels need to be joined afterwards.  That sounds hard, but
> not as hard as doing something sensible with what we have today.
>
>
Another futuristic avenue for optimization would be commuting Append and
Join when joining two similarly partitioned tables.


> I'm inclined to think that it would be useful to solve the first
> problem even if we didn't solve the second one right away (but that
> might be wrong).  As a preparatory step, I'm thinking it would be
> sensible to split grouping_planner() into an outer function that would
> handle the addition of Limit and LockRows nodes and maybe planning of
> set operations, and an inner function that would handle GROUP BY,
> DISTINCT, and possibly window function planning.
>
> Thoughts?
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Reply via email to