Re: [HACKERS] advanced partition matching algorithm for partition-wise join

Etsuro Fujita Wed, 16 Oct 2019 05:51:18 -0700

On Wed, Sep 25, 2019 at 12:59 AM Etsuro Fujita <etsuro.fuj...@gmail.com> wrote:
> I will continue to review the rest of the patch.


I've been reviewing the rest of the patch.  Here are my review comments:

* map_and_merge_partitions() checks whether the two partitions from
the outer and inner sides can be merged in two steps: 1) see if the
partitions are mapped to each other (ie, partmap1->from == index2 &&
partmap2->from == index1), and 2) see if the merged partition indexes
assigned are the same (ie, partmap1->to == partmap2->to) (or satisfy
some other conditions), but the first step seems redundant to me
because I think that if the merged partition indexes are the same,
then the partitions would be guaranteed to be mapped to each other.
Also, I noticed that that function can't handle some list-partitioning
cases properly.  Here is an example:

CREATE TABLE plt1 (a int, b int, c text) PARTITION BY LIST(c);
CREATE TABLE plt1_p1 PARTITION OF plt1 FOR VALUES IN ('0000', '0002');
CREATE TABLE plt1_p2 PARTITION OF plt1 FOR VALUES IN ('0003', '0004');
INSERT INTO plt1 SELECT i, i, to_char(i % 5, 'FM0000') FROM
generate_series(0, 24) i WHERE i % 5 IN (0, 2, 3, 4);
ANALYZE plt1;
CREATE TABLE plt2 (a int, b int, c text) PARTITION BY LIST(c);
CREATE TABLE plt2_p1 PARTITION OF plt2 FOR VALUES IN ('0001', '0002');
CREATE TABLE plt2_p2 PARTITION OF plt2 FOR VALUES IN ('0003', '0004');
INSERT INTO plt2 SELECT i, i, to_char(i % 5, 'FM0000') FROM
generate_series(0, 24) i WHERE i % 5 IN (1, 2, 3, 4);
ANALYZE plt2;
EXPLAIN (COSTS OFF)
SELECT t1.a, t1.c, t2.a, t2.c FROM plt1 t1 FULL JOIN plt2 t2 ON (t1.c
= t2.c) WHERE COALESCE(t1.a, 0) % 5 != 3 AND COALESCE(t1.a, 0) % 5 !=
4 ORDER BY t1.c, t1.a, t2.a;
                                     QUERY PLAN

-------------------------------------------------------------------------------
------
 Sort
   Sort Key: t1.c, t1.a, t2.a
   ->  Hash Full Join
         Hash Cond: (t1.c = t2.c)
         Filter: (((COALESCE(t1.a, 0) % 5) <> 3) AND ((COALESCE(t1.a,
0) % 5) <> 4))
         ->  Append
               ->  Seq Scan on plt1_p1 t1
               ->  Seq Scan on plt1_p2 t1_1
         ->  Hash
               ->  Append
                     ->  Seq Scan on plt2_p1 t2
                     ->  Seq Scan on plt2_p2 t2_1
(12 rows)

This should use partitionwise join by the new partition-matching
algorithm but doesn't.  The reason for that is because plt1_p1 and
plt2_p1 are mapped to different merged partitions and thus considered
not merge-able by map_and_merge_partitions() as-is.  I might be
missing something, but I don't think this is intentional, so I rewrote
that function completely in the attached, which is a WIP patch created
on top of the patch [1].

* In handle_missing_partition(), I noticed this:

+   else if (missing_side_inner)
+   {
+       /*
+        * If this partition has already been mapped (say because we
+        * found an overlapping range earlier), we know where does it
+        * fit in the join result. Nothing to do in that case. Else
+        * create a new merged partition.
+        */
+       PartitionMap *extra_map = &maps_with_extra[extra_part];
+       if (extra_map->to < 0)
+       {
+           extra_map->to = *next_index;
+           *next_index = *next_index + 1;
+           *merged_index = extra_map->to;
+       }
+   }

As commented, that function skips setting *merged_index when the
"extra_part" partition is already mapped to a merged partition.  This
would be correct for range partitioning, but not for list
partitioning, I think.  Here is an example:

CREATE TABLE plt1 (a int, b int, c text) PARTITION BY LIST(c);
CREATE TABLE plt1_p1 PARTITION OF plt1 FOR VALUES IN ('0000', '0001', '0002');
CREATE TABLE plt1_p2 PARTITION OF plt1 FOR VALUES IN ('0003', '0004');
INSERT INTO plt1 SELECT i, i, to_char(i % 5, 'FM0000') FROM
generate_series(0, 24) i;
ANALYZE plt1;
CREATE TABLE plt2 (a int, b int, c text) PARTITION BY LIST(c);
CREATE TABLE plt2_p1 PARTITION OF plt2 FOR VALUES IN ('0002');
CREATE TABLE plt2_p2 PARTITION OF plt2 FOR VALUES IN ('0003', '0004');
INSERT INTO plt2 SELECT i, i, to_char(i % 5, 'FM0000') FROM
generate_series(0, 24) i WHERE i % 5 IN (2, 3, 4);
ANALYZE plt2;
CREATE TABLE plt3 (a int, b int, c text) PARTITION BY LIST(c);
CREATE TABLE plt3_p1 PARTITION OF plt3 FOR VALUES IN ('0001');
CREATE TABLE plt3_p2 PARTITION OF plt3 FOR VALUES IN ('0003', '0004');
INSERT INTO plt3 SELECT i, i, to_char(i % 5, 'FM0000') FROM
generate_series(0, 24) i WHERE i % 5 IN (1, 3, 4);
ANALYZE plt3;
EXPLAIN (COSTS OFF)
SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1 t1 LEFT JOIN plt2
t2 ON (t1.c = t2.c)) FULL JOIN plt3 t3 ON (t1.c = t3.c) WHERE
COALESCE(t1.a, 0) % 5 != 3 AND coalesce(t1.a, 0) % 5 != 4 ORDER BY
t1.c, t1.a, t2.a, t3.a;
                                     QUERY PLAN

-------------------------------------------------------------------------------
------
 Sort
   Sort Key: t1.c, t1.a, t2.a, t3.a
   ->  Hash Full Join
         Hash Cond: (t1.c = t3.c)
         Filter: (((COALESCE(t1.a, 0) % 5) <> 3) AND ((COALESCE(t1.a,
0) % 5) <> 4))
         ->  Hash Left Join
               Hash Cond: (t1.c = t2.c)
               ->  Append
                     ->  Seq Scan on plt1_p1 t1
                     ->  Seq Scan on plt1_p2 t1_1
               ->  Hash
                     ->  Append
                           ->  Seq Scan on plt2_p1 t2
                           ->  Seq Scan on plt2_p2 t2_1
         ->  Hash
               ->  Append
                     ->  Seq Scan on plt3_p1 t3
                     ->  Seq Scan on plt3_p2 t3_1
(18 rows)

I think this should use 3-way partitionwise join by the new algorithm
but doesn't.  The reason for that is because when considering
partitionwise join for plt1 and plt2, partition_list_bounds_merge()
would incorrectly produce {'0000', '0002'} as the merged values for
the join segment of plt1_p1 and plt2_p1, not {'0000', '0001', '0002'},
because that function would call handle_missing_partition() for the
values '0000' and '0001' of plt1_p1, and it would set *merged_index
correctly for '0000' but not for '0001' (keeping *merged_index=-1),
due to the behavior mentioned above, resulting in the merged values
for the join segment {'0000', '0002'}, not {'0000', '0001', '0002'}.
This would cause to fail to merge the values for the join segment
{'0000', '0002'} and the values for plt3_p1 {'0001'} when considering
partitionwise join for the 2-way join and plt3.  I fixed this as well
in the attached.  handle_missing_partition() can handle both cases
where a datum is missing from the inner side and where a datum is
missing from the outer side in a unified way, but I think that that
makes the code pretty hard to read.  Also, I think
handle_missing_partition() is not caller-friendly because the caller
needs to write something like this:

+               /* A range missing from the inner side. */
+               bool        missing_side_outer;
+               bool        missing_side_inner;

+               /*
+                * For a FULL join, inner relation acts as both OUTER and INNER
+                * relation.  For LEFT and ANTI join the inner relation acts as
+                * INNER relation. For INNER and SEMI join OUTER and INNER
+                * differentiation is immaterial.
+                */
+               missing_side_inner = (jointype == JOIN_FULL ||
+                                     jointype == JOIN_LEFT ||
+                                     jointype == JOIN_ANTI);
+               missing_side_outer = (jointype == JOIN_FULL);
+               if (!handle_missing_partition(inner_maps,
+                                             outer_maps,
+                                             inner_default,
+                                             outer_part,
+                                             missing_side_outer,
+                                             missing_side_inner,
+                                             &next_index,
+                                             &default_index,
+                                             &merged_index))
+                   return NULL;

So I'd like to propose to introduce separate functions like
process_outer_partition() and process_inner_partition() in the
attached, instead of handle_missing_partition().  (I added a fast path
to these functions that if both outer/inner sides have the default
partitions, give up on merging partitions.  Also, I modified the
caller sites of proposed functions so that they call these if
necessary.)

Other minor changes in the attached:

* I modified partition_range_bounds_merge() and
partition_list_bounds_merge() so that the order of arguments for them
matches partition_bounds_merge().  Also, I did some cleanup for these
functions, and removed this in these function because this merely
re-checks the while condition above.

+       /* If we exhausted both the sides, we won't enter the loop. */
+       Assert(!finished_inner || !finished_outer);

* I modified merge_null_partitions() and merge_default_partitions()
accordingly to the changes described above.  Also, I did some cleanup
and modified the caller sites of these functions so that they call
these functions if necessary.

* I also modified generate_matching_part_pairs() accordingly.

* I added a creanup to free memory allocated by init_partition_map().

* We have this in a few places such as merge_default_partitions():

+           extra_map->to = *next_index;
+           *next_index = *next_index + 1;
+           *merged_index = extra_map->to;

To reduce code duplication, I made this into a separate function.

* I think the return statement in the end of partition_range_merge()
is useless, so I removed it.

That is all for now.  Will continue to review.

Best regards,
Etsuro Fujita

[1] 
https://www.postgresql.org/message-id/CAPmGK14WHKckT1P6UJV2B63TZAxPyMn8iZJ99XF=azunhg6...@mail.gmail.com

modify-partbounds-changes-1.patch
Description: Binary data

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

Reply via email to