On 15.11.2011 19:16, Shigeru Hanada wrote:
This is the second effort for $SUBJECT. Attached patch requires
pgsql_fdw patches[1] to be applied previously. This patch provides:
* Changes for backend
* Add new planner node ForeignJoinPath and related routines. In
current design, planner consider all of possible join combinations
between foreign tables, similar to local joins such as nested loop,
hash join and merge join. And if foreign join is cheapest, planner
produces a ForeignScan plan node for a join. So executor is not
modified heavily since 9.1.
* Add new FDW callback for planning join push-down between foreign
tables on same server. This function is optional, and allowed to
return NULL to tell planner that that join can't be handled by the
FDW.
So the way a three-way join is planned, is that the planner first asks
the FDW to plan ForeignPaths of scanning the individual tables. Then it
asks the FDW to consider pairwise joins of those ForeignPaths. Then it
asks the FDW to consider joins of the constructed ForeignPaths and
ForeignJoinPaths. Ie. the plan involving a join of three or more remote
tables is built bottom-up, just like a join of local tables.
When the FDW recognizes it's being asked to join a ForeignJoinPath and a
ForeignPath, or two ForeignJoinPaths, it throws away the old SQL it
constructed to do the two-way join, and builds a new one to join all
three tables. That seems tedious, when there are a lot of tables
involved. A FDW like the pgsql_fdw that constructs an SQL query doesn't
need to consider pairs of joins. It could just as well build the SQL for
the three-way join directly. I think the API needs to reflect that.
I wonder if we should have a heuristic to not even consider doing a join
locally, if it can be done remotely. For a query like this:
SELECT * FROM remote1 a, remote2 b, remote3 c WHERE a.id = b.id AND c.id
= b.id
it's quite obvious that the best plan is to do the join remotely, rather
than pull all the rows from all tables, and do the join locally. In
theory, if the remote database is remarkably bad at performing joins, it
might be faster to pull in all the data and do it locally, but I can't
really imagine that happening in practice.
* Changes for pgsql_fdw
* Implemente PlanForeignJoin callback function.
A couple of basic bugs I bumped into:
* WHERE-clause building fails on a cartesian product ("SELECT * FROM
remote1, remote2")
* The join planning in pgsql_fdw seems to get confused and gives up if
there are any local tables also involved in the query (e.g "explain
SELECT * FROM remote1, remote2 LEFT OUTER JOIN local1 on (local1.a =
remote2.a) WHERE remote1.a = remote2.a;")
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers