On 2020-07-14 15:27, Ashutosh Bapat wrote:
On Tue, Jul 14, 2020 at 12:48 AM Alexey Kondratov
<a.kondra...@postgrespro.ru> wrote:
I built a simple two node multi-tenant schema for tests, which can be
easily set up with attached scripts. It creates three tables
(companies,
users, documents) distributed over two nodes. Everything can be found
in
this Gist [2] as well.
Some real-life test queries show, that all single-node queries aren't
pushed-down to the required node. For example:
SELECT
*
FROM
documents
INNER JOIN users ON documents.user_id = users.id
WHERE
documents.company_id = 5
AND users.company_id = 5;
There are a couple of things happening here
1. the clauses on company_id in WHERE clause are causing partition
pruning. Partition-wise join is disabled with partition pruning before
PG13. In PG13 we have added advanced partition matching algorithm
which will allow partition-wise join with partition pruning.
I forgot to mention that I use a recent master (991c444e7a) for tests
with
enable_partitionwise_join = 'on'
enable_partitionwise_aggregate = 'on'
of course. I've also tried postgres_fdw.use_remote_estimate = true
followed by ANALYSE on both nodes (it is still used in setup.sh script).
BTW, can you, please, share a link to commit / thread about allowing
partition-wise join and partition pruning to work together in PG13?
2. the query has no equality condition on the partition key of the
tables being joined. Partitionwise join is possible only when there's
an equality condition on the partition keys (company_id) of the
joining tables. PostgreSQL's optimizer is not smart enough to convert
the equality conditions in WHERE clause into equality conditions on
partition keys. So having those conditions just in WHERE clause does
not help. Instead please add equality conditions on partition keys in
JOIN .. ON clause or WHERE clause (only for INNER join).
With adding documents.company_id = users.company_id
SELECT *
FROM
documents
INNER JOIN users ON (documents.company_id = users.company_id
AND documents.user_id = users.id)
WHERE
documents.company_id = 5
AND users.company_id = 5;
query plan remains the same.
executed as following
QUERY PLAN
-------------------------------------------------------
Nested Loop
Join Filter: (documents.user_id = users.id)
-> Foreign Scan on users_node2 users
-> Materialize
-> Foreign Scan on documents_node2 documents
i.e. it uses two foreign scans and does the final join locally.
However,
once I specify target partitions explicitly, then the entire query is
pushed down to the foreign node:
QUERY PLAN
---------------------------------------------------------
Foreign Scan
Relations: (documents_node2) INNER JOIN (users_node2)
Execution time is dropped significantly as well — by more than 3 times
even for this small test database. Situation for simple queries with
aggregates or joins and aggregates followed by the sharding key filter
is the same. Something similar was briefly discussed in this thread
[3].
IIUC, it means that push-down of queries through the postgres_fdw
works
perfectly well, the problem is with partition-wise operation detection
at the planning time. Currently, partition-wise aggregate routines,
e.g., looks for a GROUP BY and checks whether sharding key exists
there
or not. After that PARTITIONWISE_AGGREGATE_* flag is set. However, it
doesn't look for a content of WHERE clause, so frankly speaking it
isn't
a problem, this functionality is not yet implemented.
Actually, sometimes I was able to push down queries with aggregate
simply by adding an additional GROUP BY with sharding key, like this:
SELECT
count(*)
FROM
documents
WHERE
company_id = 5
GROUP BY company_id;
This gets pushed down since GROUP BY clause is on the partition key.
Sure, but it only works *sometimes*, I've never seen most of such simple
queries with aggregates to be pushed down, e.g.:
SELECT
sum(id)
FROM
documents_node2
WHERE
company_id = 5
GROUP BY
company_id;
whether 'GROUP BY company_id' is used or not.
Although it seems that it will be easier to start with aggregates,
probably we should initially plan a more general solution? For
example,
check that all involved tables are filtered by partitioning key and
push
down the entire query if all of them target the same foreign server.
Any thoughts?
I think adding just equality conditions on the partition key will be
enough. No need for any code change.
So, it hasn't helped. Maybe I could modify some costs to verify that
push-down of such joins is ever possible?
Anyway, what about aggregates? Partition-wise aggregates work fine for
queries like
SELECT
count(*)
FROM
documents
GROUP BY
company_id;
but once I narrow it to a single partition with 'WHERE company_id = 5',
then it is being executed in a very inefficient way — takes all rows
from remote partition / node and performs aggregate locally. It doesn't
seem like a problem with query itself.
In my experience, both partition-wise joins and aggregates work well
with simple GROUP or JOIN by the partitioning key, which corresponds to
massive multi-partition OLAP queries. However, both stop working for a
single-partition queries with WHERE, when postgres_fdw and partitioning
are used. I'd be glad if you share any new guesses of how to make them
working without code modification.
Thanks
--
Alexey Kondratov
Postgres Professional https://www.postgrespro.com
Russian Postgres Company