Re: Partitioning and postgres_fdw optimisations for multi-tenancy

Alexey Kondratov Tue, 14 Jul 2020 08:13:21 -0700

On 2020-07-14 15:27, Ashutosh Bapat wrote:

On Tue, Jul 14, 2020 at 12:48 AM Alexey Kondratov
<a.kondra...@postgrespro.ru> wrote:

I built a simple two node multi-tenant schema for tests, which can be

easily set up with attached scripts. It creates three tables(companies,users, documents) distributed over two nodes. Everything can be foundin

this Gist [2] as well.


Some real-life test queries show, that all single-node queries aren't
pushed-down to the required node. For example:

SELECT
     *
FROM
     documents
     INNER JOIN users ON documents.user_id = users.id
WHERE
     documents.company_id = 5
     AND users.company_id = 5;


There are a couple of things happening here
1. the clauses on company_id in WHERE clause are causing partition
pruning. Partition-wise join is disabled with partition pruning before
PG13. In PG13 we have added advanced partition matching algorithm
which will allow partition-wise join with partition pruning.

I forgot to mention that I use a recent master (991c444e7a) for testswith


enable_partitionwise_join = 'on'
enable_partitionwise_aggregate = 'on'

of course. I've also tried postgres_fdw.use_remote_estimate = truefollowed by ANALYSE on both nodes (it is still used in setup.sh script).

BTW, can you, please, share a link to commit / thread about allowingpartition-wise join and partition pruning to work together in PG13?


2. the query has no equality condition on the partition key of the
tables being joined. Partitionwise join is possible only when there's
an equality condition on the partition keys (company_id) of the
joining tables. PostgreSQL's optimizer is not smart enough to convert
the equality conditions in WHERE clause into equality conditions on
partition keys. So having those conditions just in WHERE clause does
not help. Instead please add equality conditions on partition keys in
JOIN .. ON clause or WHERE clause (only for INNER join).


With adding documents.company_id = users.company_id

SELECT *
FROM
    documents
    INNER JOIN users ON (documents.company_id = users.company_id
            AND documents.user_id = users.id)
WHERE
    documents.company_id = 5
    AND users.company_id = 5;

query plan remains the same.


executed as following

                       QUERY PLAN
-------------------------------------------------------
  Nested Loop
    Join Filter: (documents.user_id = users.id)
    ->  Foreign Scan on users_node2 users
    ->  Materialize
          ->  Foreign Scan on documents_node2 documents

i.e. it uses two foreign scans and does the final join locally.However,

once I specify target partitions explicitly, then the entire query is
pushed down to the foreign node:

                        QUERY PLAN
---------------------------------------------------------
  Foreign Scan
    Relations: (documents_node2) INNER JOIN (users_node2)

Execution time is dropped significantly as well — by more than 3 times
even for this small test database. Situation for simple queries with
aggregates or joins and aggregates followed by the sharding key filter

is the same. Something similar was briefly discussed in this thread[3].

IIUC, it means that push-down of queries through the postgres_fdwworks

perfectly well, the problem is with partition-wise operation detection
at the planning time. Currently, partition-wise aggregate routines,

e.g., looks for a GROUP BY and checks whether sharding key existsthere

or not. After that PARTITIONWISE_AGGREGATE_* flag is set. However, it

doesn't look for a content of WHERE clause, so frankly speaking itisn't

a problem, this functionality is not yet implemented.

Actually, sometimes I was able to push down queries with aggregate
simply by adding an additional GROUP BY with sharding key, like this:

SELECT
     count(*)
FROM
     documents
WHERE
     company_id = 5
GROUP BY company_id;


This gets pushed down since GROUP BY clause is on the partition key.

Sure, but it only works *sometimes*, I've never seen most of such simplequeries with aggregates to be pushed down, e.g.:


SELECT
    sum(id)
FROM
    documents_node2
WHERE
    company_id = 5
GROUP BY
    company_id;

whether 'GROUP BY company_id' is used or not.

Although it seems that it will be easier to start with aggregates,
probably we should initially plan a more general solution? Forexample,check that all involved tables are filtered by partitioning key andpush
down the entire query if all of them target the same foreign server.

Any thoughts?
I think adding just equality conditions on the partition key will be
enough. No need for any code change.

So, it hasn't helped. Maybe I could modify some costs to verify thatpush-down of such joins is ever possible?

Anyway, what about aggregates? Partition-wise aggregates work fine forqueries like


SELECT
    count(*)
FROM
    documents
GROUP BY
    company_id;

but once I narrow it to a single partition with 'WHERE company_id = 5',then it is being executed in a very inefficient way — takes all rowsfrom remote partition / node and performs aggregate locally. It doesn'tseem like a problem with query itself.

In my experience, both partition-wise joins and aggregates work wellwith simple GROUP or JOIN by the partitioning key, which corresponds tomassive multi-partition OLAP queries. However, both stop working for asingle-partition queries with WHERE, when postgres_fdw and partitioningare used. I'd be glad if you share any new guesses of how to make themworking without code modification.



Thanks
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

Re: Partitioning and postgres_fdw optimisations for multi-tenancy

Reply via email to