On Wed, Sep 25, 2019 at 12:59 AM Etsuro Fujita <etsuro.fuj...@gmail.com> wrote: > I will continue to review the rest of the patch.
I've been reviewing the rest of the patch. Here are my review comments: * map_and_merge_partitions() checks whether the two partitions from the outer and inner sides can be merged in two steps: 1) see if the partitions are mapped to each other (ie, partmap1->from == index2 && partmap2->from == index1), and 2) see if the merged partition indexes assigned are the same (ie, partmap1->to == partmap2->to) (or satisfy some other conditions), but the first step seems redundant to me because I think that if the merged partition indexes are the same, then the partitions would be guaranteed to be mapped to each other. Also, I noticed that that function can't handle some list-partitioning cases properly. Here is an example: CREATE TABLE plt1 (a int, b int, c text) PARTITION BY LIST(c); CREATE TABLE plt1_p1 PARTITION OF plt1 FOR VALUES IN ('0000', '0002'); CREATE TABLE plt1_p2 PARTITION OF plt1 FOR VALUES IN ('0003', '0004'); INSERT INTO plt1 SELECT i, i, to_char(i % 5, 'FM0000') FROM generate_series(0, 24) i WHERE i % 5 IN (0, 2, 3, 4); ANALYZE plt1; CREATE TABLE plt2 (a int, b int, c text) PARTITION BY LIST(c); CREATE TABLE plt2_p1 PARTITION OF plt2 FOR VALUES IN ('0001', '0002'); CREATE TABLE plt2_p2 PARTITION OF plt2 FOR VALUES IN ('0003', '0004'); INSERT INTO plt2 SELECT i, i, to_char(i % 5, 'FM0000') FROM generate_series(0, 24) i WHERE i % 5 IN (1, 2, 3, 4); ANALYZE plt2; EXPLAIN (COSTS OFF) SELECT t1.a, t1.c, t2.a, t2.c FROM plt1 t1 FULL JOIN plt2 t2 ON (t1.c = t2.c) WHERE COALESCE(t1.a, 0) % 5 != 3 AND COALESCE(t1.a, 0) % 5 != 4 ORDER BY t1.c, t1.a, t2.a; QUERY PLAN ------------------------------------------------------------------------------- ------ Sort Sort Key: t1.c, t1.a, t2.a -> Hash Full Join Hash Cond: (t1.c = t2.c) Filter: (((COALESCE(t1.a, 0) % 5) <> 3) AND ((COALESCE(t1.a, 0) % 5) <> 4)) -> Append -> Seq Scan on plt1_p1 t1 -> Seq Scan on plt1_p2 t1_1 -> Hash -> Append -> Seq Scan on plt2_p1 t2 -> Seq Scan on plt2_p2 t2_1 (12 rows) This should use partitionwise join by the new partition-matching algorithm but doesn't. The reason for that is because plt1_p1 and plt2_p1 are mapped to different merged partitions and thus considered not merge-able by map_and_merge_partitions() as-is. I might be missing something, but I don't think this is intentional, so I rewrote that function completely in the attached, which is a WIP patch created on top of the patch [1]. * In handle_missing_partition(), I noticed this: + else if (missing_side_inner) + { + /* + * If this partition has already been mapped (say because we + * found an overlapping range earlier), we know where does it + * fit in the join result. Nothing to do in that case. Else + * create a new merged partition. + */ + PartitionMap *extra_map = &maps_with_extra[extra_part]; + if (extra_map->to < 0) + { + extra_map->to = *next_index; + *next_index = *next_index + 1; + *merged_index = extra_map->to; + } + } As commented, that function skips setting *merged_index when the "extra_part" partition is already mapped to a merged partition. This would be correct for range partitioning, but not for list partitioning, I think. Here is an example: CREATE TABLE plt1 (a int, b int, c text) PARTITION BY LIST(c); CREATE TABLE plt1_p1 PARTITION OF plt1 FOR VALUES IN ('0000', '0001', '0002'); CREATE TABLE plt1_p2 PARTITION OF plt1 FOR VALUES IN ('0003', '0004'); INSERT INTO plt1 SELECT i, i, to_char(i % 5, 'FM0000') FROM generate_series(0, 24) i; ANALYZE plt1; CREATE TABLE plt2 (a int, b int, c text) PARTITION BY LIST(c); CREATE TABLE plt2_p1 PARTITION OF plt2 FOR VALUES IN ('0002'); CREATE TABLE plt2_p2 PARTITION OF plt2 FOR VALUES IN ('0003', '0004'); INSERT INTO plt2 SELECT i, i, to_char(i % 5, 'FM0000') FROM generate_series(0, 24) i WHERE i % 5 IN (2, 3, 4); ANALYZE plt2; CREATE TABLE plt3 (a int, b int, c text) PARTITION BY LIST(c); CREATE TABLE plt3_p1 PARTITION OF plt3 FOR VALUES IN ('0001'); CREATE TABLE plt3_p2 PARTITION OF plt3 FOR VALUES IN ('0003', '0004'); INSERT INTO plt3 SELECT i, i, to_char(i % 5, 'FM0000') FROM generate_series(0, 24) i WHERE i % 5 IN (1, 3, 4); ANALYZE plt3; EXPLAIN (COSTS OFF) SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1 t1 LEFT JOIN plt2 t2 ON (t1.c = t2.c)) FULL JOIN plt3 t3 ON (t1.c = t3.c) WHERE COALESCE(t1.a, 0) % 5 != 3 AND coalesce(t1.a, 0) % 5 != 4 ORDER BY t1.c, t1.a, t2.a, t3.a; QUERY PLAN ------------------------------------------------------------------------------- ------ Sort Sort Key: t1.c, t1.a, t2.a, t3.a -> Hash Full Join Hash Cond: (t1.c = t3.c) Filter: (((COALESCE(t1.a, 0) % 5) <> 3) AND ((COALESCE(t1.a, 0) % 5) <> 4)) -> Hash Left Join Hash Cond: (t1.c = t2.c) -> Append -> Seq Scan on plt1_p1 t1 -> Seq Scan on plt1_p2 t1_1 -> Hash -> Append -> Seq Scan on plt2_p1 t2 -> Seq Scan on plt2_p2 t2_1 -> Hash -> Append -> Seq Scan on plt3_p1 t3 -> Seq Scan on plt3_p2 t3_1 (18 rows) I think this should use 3-way partitionwise join by the new algorithm but doesn't. The reason for that is because when considering partitionwise join for plt1 and plt2, partition_list_bounds_merge() would incorrectly produce {'0000', '0002'} as the merged values for the join segment of plt1_p1 and plt2_p1, not {'0000', '0001', '0002'}, because that function would call handle_missing_partition() for the values '0000' and '0001' of plt1_p1, and it would set *merged_index correctly for '0000' but not for '0001' (keeping *merged_index=-1), due to the behavior mentioned above, resulting in the merged values for the join segment {'0000', '0002'}, not {'0000', '0001', '0002'}. This would cause to fail to merge the values for the join segment {'0000', '0002'} and the values for plt3_p1 {'0001'} when considering partitionwise join for the 2-way join and plt3. I fixed this as well in the attached. handle_missing_partition() can handle both cases where a datum is missing from the inner side and where a datum is missing from the outer side in a unified way, but I think that that makes the code pretty hard to read. Also, I think handle_missing_partition() is not caller-friendly because the caller needs to write something like this: + /* A range missing from the inner side. */ + bool missing_side_outer; + bool missing_side_inner; + /* + * For a FULL join, inner relation acts as both OUTER and INNER + * relation. For LEFT and ANTI join the inner relation acts as + * INNER relation. For INNER and SEMI join OUTER and INNER + * differentiation is immaterial. + */ + missing_side_inner = (jointype == JOIN_FULL || + jointype == JOIN_LEFT || + jointype == JOIN_ANTI); + missing_side_outer = (jointype == JOIN_FULL); + if (!handle_missing_partition(inner_maps, + outer_maps, + inner_default, + outer_part, + missing_side_outer, + missing_side_inner, + &next_index, + &default_index, + &merged_index)) + return NULL; So I'd like to propose to introduce separate functions like process_outer_partition() and process_inner_partition() in the attached, instead of handle_missing_partition(). (I added a fast path to these functions that if both outer/inner sides have the default partitions, give up on merging partitions. Also, I modified the caller sites of proposed functions so that they call these if necessary.) Other minor changes in the attached: * I modified partition_range_bounds_merge() and partition_list_bounds_merge() so that the order of arguments for them matches partition_bounds_merge(). Also, I did some cleanup for these functions, and removed this in these function because this merely re-checks the while condition above. + /* If we exhausted both the sides, we won't enter the loop. */ + Assert(!finished_inner || !finished_outer); * I modified merge_null_partitions() and merge_default_partitions() accordingly to the changes described above. Also, I did some cleanup and modified the caller sites of these functions so that they call these functions if necessary. * I also modified generate_matching_part_pairs() accordingly. * I added a creanup to free memory allocated by init_partition_map(). * We have this in a few places such as merge_default_partitions(): + extra_map->to = *next_index; + *next_index = *next_index + 1; + *merged_index = extra_map->to; To reduce code duplication, I made this into a separate function. * I think the return statement in the end of partition_range_merge() is useless, so I removed it. That is all for now. Will continue to review. Best regards, Etsuro Fujita [1] https://www.postgresql.org/message-id/CAPmGK14WHKckT1P6UJV2B63TZAxPyMn8iZJ99XF=azunhg6...@mail.gmail.com
modify-partbounds-changes-1.patch
Description: Binary data