On 11/30/20 7:43 PM, Anastasia Lubennikova wrote:
This entry was inactive during this CF, so I've marked it as returned with feedback. Feel free to resubmit an updated version to a future commitfest.
I return the patch to commitfest. My current reason differs from reason of origin author. This patch can open a door for more complex optimizations in the partitionwise join push-down technique. I mean, we can push-down join not only of two partitioned tables with the same partition schema, but a partitioned (sharded) table with an arbitrary subplan that is provable independent of local resources.

Example:

CREATE TABLE p(a int) PARTITION BY HASH (a);
CREATE TABLE p1 PARTITION OF p FOR VALUES WITH (MODULUS 3, REMAINDER 0);
CREATE TABLE p2 PARTITION OF p FOR VALUES WITH (MODULUS 3, REMAINDER 1);
CREATE TABLE p3 PARTITION OF p FOR VALUES WITH (MODULUS 3, REMAINDER 2);

SELECT * FROM p, (SELECT * FROM generate_series(1,2) AS a) AS s
WHERE p.a=s.a;

 Hash Join
   Hash Cond: (p.a = a.a)
   ->  Append
         ->  Seq Scan on p1 p_1
         ->  Seq Scan on p2 p_2
         ->  Seq Scan on p3 p_3
   ->  Hash
         ->  Function Scan on generate_series a

But with asymmetric join feature we have the plan:

 Append
   ->  Hash Join
         Hash Cond: (p_1.a = a.a)
         ->  Seq Scan on p1 p_1
         ->  Hash
               ->  Function Scan on generate_series a
   ->  Hash Join
         Hash Cond: (p_2.a = a.a)
         ->  Seq Scan on p2 p_2
         ->  Hash
               ->  Function Scan on generate_series a
   ->  Hash Join
         Hash Cond: (p_3.a = a.a)
         ->  Seq Scan on p3 p_3
         ->  Hash
               ->  Function Scan on generate_series a

In the case of FDW-sharding it means that if we can prove that the inner relation is independent from the execution server, we can push-down these joins and execute it in parallel.

--
regards,
Andrey Lepikhov
Postgres Professional


Reply via email to