Re: apply_scanjoin_target_to_paths and partitionwise join

Ashutosh Bapat Mon, 15 Apr 2024 00:00:15 -0700

Here's patch with

On Thu, Apr 11, 2024 at 12:24 PM Ashutosh Bapat <
ashutosh.bapat....@gmail.com> wrote:


>
>
> On Thu, Apr 11, 2024 at 12:07 PM Ashutosh Bapat <
> ashutosh.bapat....@gmail.com> wrote:
>
>> Hi All,
>> Per below code and comment in apply_scanjoin_target_to_paths(), the
>> function zaps all the paths of a partitioned relation.
>> /*
>> * If the rel is partitioned, we want to drop its existing paths and
>> * generate new ones.  This function would still be correct if we kept the
>> * existing paths: we'd modify them to generate the correct target above
>> * the partitioning Append, and then they'd compete on cost with paths
>> * generating the target below the Append
>> ... snip ...
>> */
>> if (rel_is_partitioned)
>> rel->pathlist = NIL;
>>
>> Later the function adjusts the targets of paths in child relations and
>> constructs Append paths from them. That works for simple partitioned
>> relations but not for join between partitioned relations. When
>> enable_partitionwise_join is true, the joinrel representing a join between
>> partitioned relations may have join paths joining append paths and Append
>> paths containing child join paths. Once we zap the pathlist, the only paths
>> that can be computed again are the Append paths. If the optimal path,
>> before applying the new target, was a join of append paths it will be lost
>> forever. This will result in choosing a suboptimal Append path.
>>
>> We have one such query in our regression set.
>>
>> SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1 LEFT JOIN
>> plt2_adv t2 ON (t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.c = t3.c) WHERE
>> coalesce(t1.a, 0    ) % 5 != 3 AND coalesce(t1.a, 0) % 5 != 4 ORDER BY
>> t1.c, t1.a, t2.a, t3.a;
>>
>> For this query, the cheapest Append of Joins path has cost 24.97..25.57
>> and the cheapest Join of Appends path has cost 21.29..21.81. The latter
>> should be chosen even though enable_partitionwise_join is ON. But this
>> function chooses the first.
>>
>> The solution is to zap the pathlists only for simple partitioned
>> relations like the attached patch.
>>
>> With this patch above query does not choose non-partitionwise join path
>> and partition_join test fails. That's expected. But we need to replace that
>> query with some query which uses partitionwise join while maintaining the
>> conditions of the test as explained in the comment above that query. I have
>> tried a few variations but without success. Suggestions welcome.
>>
>
Found a replacement for that query by using a 2-way join instead of 3-way
join. The query still executes the referenced code in
process_outer_partition() as mentioned in the comment. I did think about
removing the original query. But it is the only example in our regression
tests where partitionwise join is more costly than non-partitionwise join.
So I have left it as is in the test. I am fine if others think that we
should remove it.

Adding to the next commitfest but better to consider this for the next set
of minor releases.

-- 
Best Wishes,
Ashutosh Bapat

From 7a09f8629fff51727f9f59ed7bb57fcc8a452bed Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.ba...@enterprisedb.com>
Date: Mon, 15 Apr 2024 12:03:01 +0530
Subject: [PATCH] Fix partitioned join case in apply_scanjoin_target_to_paths()

apply_scanjoin_target_to_paths() assumes that the cheapest paths for all
partitioned relations can be computed by Append'ing corresponding paths from
the partitions. This is not true for partitioned join relations. Partitionwise
join paths may not be optimal always. Hence do not wipe out all existing paths
only when the relation is simple before computing paths with new target in that
function.

Ashutosh Bapat, initial investigation by Jacob Wartak
---
 src/backend/optimizer/plan/planner.c         |  4 +-
 src/test/regress/expected/partition_join.out | 81 ++++++++++++++------
 src/test/regress/sql/partition_join.sql      | 14 +++-
 3 files changed, 69 insertions(+), 30 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index bd4e4ceeca..5d17c61f3b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7099,7 +7099,7 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
 	 * generate_useful_gather_paths to add path(s) to the main list, and
 	 * finally zap the partial pathlist.
 	 */
-	if (rel_is_partitioned)
+	if (rel_is_partitioned && IS_SIMPLE_REL(rel))
 		rel->pathlist = NIL;
 
 	/*
@@ -7125,7 +7125,7 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
 	}
 
 	/* Finish dropping old paths for a partitioned rel, per comment above */
-	if (rel_is_partitioned)
+	if (rel_is_partitioned && IS_SIMPLE_REL(rel))
 		rel->partial_pathlist = NIL;
 
 	/* Extract SRF-free scan/join target. */
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 8b179fa60d..639ce38c80 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -4721,38 +4721,35 @@ CREATE TABLE plt3_adv_p1 PARTITION OF plt3_adv FOR VALUES IN ('0001');
 CREATE TABLE plt3_adv_p2 PARTITION OF plt3_adv FOR VALUES IN ('0003', '0004');
 INSERT INTO plt3_adv SELECT i, i, to_char(i % 5, 'FM0000') FROM generate_series(0, 24) i WHERE i % 5 IN (1, 3, 4);
 ANALYZE plt3_adv;
--- This tests that when merging partitions from plt1_adv and plt2_adv in
--- merge_list_bounds(), process_outer_partition() returns an already-assigned
--- merged partition when re-called with plt1_adv_p1 for the second list value
--- '0001' of that partition
+-- Both the queries test that when merging partitions from plt1_adv and
+-- plt2_adv in merge_list_bounds(), process_outer_partition() returns an
+-- already-assigned merged partition when re-called with plt1_adv_p1 for the
+-- second list value '0001' of that partition. But the first query doesn't end
+-- up using partitionwise join since it's more costly whereas the second one
+-- use partitionwise join.
 EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1 LEFT JOIN plt2_adv t2 ON (t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.c = t3.c) WHERE coalesce(t1.a, 0) % 5 != 3 AND coalesce(t1.a, 0) % 5 != 4 ORDER BY t1.c, t1.a, t2.a, t3.a;
-                                          QUERY PLAN                                           
------------------------------------------------------------------------------------------------
+                                     QUERY PLAN                                      
+-------------------------------------------------------------------------------------
  Sort
    Sort Key: t1.c, t1.a, t2.a, t3.a
-   ->  Append
-         ->  Hash Full Join
-               Hash Cond: (t1_1.c = t3_1.c)
-               Filter: (((COALESCE(t1_1.a, 0) % 5) <> 3) AND ((COALESCE(t1_1.a, 0) % 5) <> 4))
-               ->  Hash Left Join
-                     Hash Cond: (t1_1.c = t2_1.c)
+   ->  Hash Full Join
+         Hash Cond: (t1.c = t3.c)
+         Filter: (((COALESCE(t1.a, 0) % 5) <> 3) AND ((COALESCE(t1.a, 0) % 5) <> 4))
+         ->  Hash Left Join
+               Hash Cond: (t1.c = t2.c)
+               ->  Append
                      ->  Seq Scan on plt1_adv_p1 t1_1
-                     ->  Hash
-                           ->  Seq Scan on plt2_adv_p1 t2_1
-               ->  Hash
-                     ->  Seq Scan on plt3_adv_p1 t3_1
-         ->  Hash Full Join
-               Hash Cond: (t1_2.c = t3_2.c)
-               Filter: (((COALESCE(t1_2.a, 0) % 5) <> 3) AND ((COALESCE(t1_2.a, 0) % 5) <> 4))
-               ->  Hash Left Join
-                     Hash Cond: (t1_2.c = t2_2.c)
                      ->  Seq Scan on plt1_adv_p2 t1_2
-                     ->  Hash
-                           ->  Seq Scan on plt2_adv_p2 t2_2
                ->  Hash
+                     ->  Append
+                           ->  Seq Scan on plt2_adv_p1 t2_1
+                           ->  Seq Scan on plt2_adv_p2 t2_2
+         ->  Hash
+               ->  Append
+                     ->  Seq Scan on plt3_adv_p1 t3_1
                      ->  Seq Scan on plt3_adv_p2 t3_2
-(23 rows)
+(18 rows)
 
 SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1 LEFT JOIN plt2_adv t2 ON (t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.c = t3.c) WHERE coalesce(t1.a, 0) % 5 != 3 AND coalesce(t1.a, 0) % 5 != 4 ORDER BY t1.c, t1.a, t2.a, t3.a;
  a  |  c   | a  |  c   | a  |  c   
@@ -4814,6 +4811,42 @@ SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1 LEFT JOIN plt2_adv t
  22 | 0002 | 22 | 0002 |    | 
 (55 rows)
 
+EXPLAIN (COSTS OFF)
+SELECT t1.a, t1.c, t2.a, t2.c FROM (plt1_adv t1 LEFT JOIN plt2_adv t2 ON (t1.c = t2.c)) WHERE coalesce(t1.a, t2.a) IN (3, 4) ORDER BY t1.c, t1.a, t2.a;
+                                 QUERY PLAN                                  
+-----------------------------------------------------------------------------
+ Sort
+   Sort Key: t1.c, t1.a, t2.a
+   ->  Append
+         ->  Hash Left Join
+               Hash Cond: (t1_1.c = t2_1.c)
+               Filter: (COALESCE(t1_1.a, t2_1.a) = ANY ('{3,4}'::integer[]))
+               ->  Seq Scan on plt1_adv_p1 t1_1
+               ->  Hash
+                     ->  Seq Scan on plt2_adv_p1 t2_1
+         ->  Hash Left Join
+               Hash Cond: (t1_2.c = t2_2.c)
+               Filter: (COALESCE(t1_2.a, t2_2.a) = ANY ('{3,4}'::integer[]))
+               ->  Seq Scan on plt1_adv_p2 t1_2
+               ->  Hash
+                     ->  Seq Scan on plt2_adv_p2 t2_2
+(15 rows)
+
+SELECT t1.a, t1.c, t2.a, t2.c FROM (plt1_adv t1 LEFT JOIN plt2_adv t2 ON (t1.c = t2.c)) WHERE coalesce(t1.a, t2.a) IN (3, 4) ORDER BY t1.c, t1.a, t2.a;
+ a |  c   | a  |  c   
+---+------+----+------
+ 3 | 0003 |  3 | 0003
+ 3 | 0003 |  8 | 0003
+ 3 | 0003 | 13 | 0003
+ 3 | 0003 | 18 | 0003
+ 3 | 0003 | 23 | 0003
+ 4 | 0004 |  4 | 0004
+ 4 | 0004 |  9 | 0004
+ 4 | 0004 | 14 | 0004
+ 4 | 0004 | 19 | 0004
+ 4 | 0004 | 24 | 0004
+(10 rows)
+
 DROP TABLE plt1_adv;
 DROP TABLE plt2_adv;
 DROP TABLE plt3_adv;
diff --git a/src/test/regress/sql/partition_join.sql b/src/test/regress/sql/partition_join.sql
index 04d7adb8e5..bf3a139ab2 100644
--- a/src/test/regress/sql/partition_join.sql
+++ b/src/test/regress/sql/partition_join.sql
@@ -1135,14 +1135,20 @@ CREATE TABLE plt3_adv_p2 PARTITION OF plt3_adv FOR VALUES IN ('0003', '0004');
 INSERT INTO plt3_adv SELECT i, i, to_char(i % 5, 'FM0000') FROM generate_series(0, 24) i WHERE i % 5 IN (1, 3, 4);
 ANALYZE plt3_adv;
 
--- This tests that when merging partitions from plt1_adv and plt2_adv in
--- merge_list_bounds(), process_outer_partition() returns an already-assigned
--- merged partition when re-called with plt1_adv_p1 for the second list value
--- '0001' of that partition
+-- Both the queries test that when merging partitions from plt1_adv and
+-- plt2_adv in merge_list_bounds(), process_outer_partition() returns an
+-- already-assigned merged partition when re-called with plt1_adv_p1 for the
+-- second list value '0001' of that partition. But the first query doesn't end
+-- up using partitionwise join since it's more costly whereas the second one
+-- use partitionwise join.
 EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1 LEFT JOIN plt2_adv t2 ON (t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.c = t3.c) WHERE coalesce(t1.a, 0) % 5 != 3 AND coalesce(t1.a, 0) % 5 != 4 ORDER BY t1.c, t1.a, t2.a, t3.a;
 SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1 LEFT JOIN plt2_adv t2 ON (t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.c = t3.c) WHERE coalesce(t1.a, 0) % 5 != 3 AND coalesce(t1.a, 0) % 5 != 4 ORDER BY t1.c, t1.a, t2.a, t3.a;
 
+EXPLAIN (COSTS OFF)
+SELECT t1.a, t1.c, t2.a, t2.c FROM (plt1_adv t1 LEFT JOIN plt2_adv t2 ON (t1.c = t2.c)) WHERE coalesce(t1.a, t2.a) IN (3, 4) ORDER BY t1.c, t1.a, t2.a;
+SELECT t1.a, t1.c, t2.a, t2.c FROM (plt1_adv t1 LEFT JOIN plt2_adv t2 ON (t1.c = t2.c)) WHERE coalesce(t1.a, t2.a) IN (3, 4) ORDER BY t1.c, t1.a, t2.a;
+
 DROP TABLE plt1_adv;
 DROP TABLE plt2_adv;
 DROP TABLE plt3_adv;
-- 
2.34.1

Re: apply_scanjoin_target_to_paths and partitionwise join

Reply via email to