[HACKERS] Removing [Merge]Append nodes which contain a single subpath

David Rowley Wed, 25 Oct 2017 15:00:07 -0700

It seems like a good idea for the planner not to generate Append or
MergeAppend paths when the node contains just a single subpath. If we
were able to remove these then the planner would have more flexibility
to build a better plan.

As of today, because we include this needless [Merge]Append node, we
cannot parallelise scans below the Append. We must also include a
Materialize node in Merge Joins since both MergeAppend and Append
don't support mark and restore. Also, as shown in [1], there's an
overhead to pulling tuples through nodes.

I've been looking into resolving this issue but the ways I've explored
so far seems to be bending the current planner a bit out of shape.

Method 1:

In set_append_rel_size() detect when just a single subpath would be
added to the Append path. Set a promotion_child RelOptInfo field in
the base rel's RelOptInfo, and during make_one_rel, after
set_base_rel_sizes() modify the joinlist to get rid of the parent and
use the child instead.

This pretty much breaks the assumption we have that the finalrel will
have all the relids to root->all_baserels. We have an Assert in
make_one_rel() which checks this is true.

There are also complications around set_rel_size() generating paths
for subqueries which may be parameterized paths with the Append parent
required for the parameterization to work.

Method 2:

Invent a ProxyPath concept that allows us to add Paths belonging to
one relation to another relation. The ProxyPaths can have
translation_vars to translate targetlists to reference the correct
Vars. These ProxyPaths could exist up as far as createplan, where we
could perform the translation and build the ProxyPath's subnode
instead.

This method is not exactly very clean either as there are various
places around the planner we'd need to give special treatment to these
ProxyPaths, such as is_projection_capable_path() would need to return
the projection capability of the subpath within the ProxyPath since we
never actually create a "Proxy" node.

Probably either of these two methods could be made to work. I prefer
method 2, since that infrastructure could one day be put towards
scanning a Materialized View instead of the relation. in order to
satisfy some query. However, method 2 appears to also require some Var
translation in Path targetlists which contain this Proxy path, either
that or some global Var replacement would need to be done during
setrefs.

I'm posting here in the hope that it will steer my development of this
in a direction that's acceptable to the community.

Perhaps there is also a "Method 3" which I've not thought about which
would be much cleaner.

[1]
https://www.postgresql.org/message-id/CAKJS1f9UXdk6ZYyqbJnjFO9a9hyHKGW7B%3DZRh-rxy9qxfPA5Gw%40mail.gmail.com

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Removing [Merge]Append nodes which contain a single subpath

Reply via email to