Re: [HACKERS] advanced partition matching algorithm for partition-wise join

Etsuro Fujita Tue, 30 Jul 2019 02:01:06 -0700

On Fri, Jul 19, 2019 at 10:44 PM Robert Haas <robertmh...@gmail.com> wrote:
> On Thu, Jul 18, 2019 at 2:55 AM Etsuro Fujita <etsuro.fuj...@gmail.com> wrote:
> > I.e., partition_bounds_merge() is performed for each pair of input
> > partitioned relations for a join relation in try_partitionwise_join().
> > Since partition_bounds_merge() would need a lot of CPU cycles, I don't
> > think this is acceptable; ISTM that some redesign is needed to avoid
> > this.  I'm wondering that once we successfully merged partition bounds
> > from a pair of input partitioned relations for the join relation, by
> > using the merged partition bounds, we could get the lists of matching
> > to-be-joined partitions for subsequent pairs of input partitioned
> > relations for the join relation in a more efficient way than by
> > performing partition_bounds_merge() as proposed in the patch.
>
> I don't know whether partition_bounds_merge() is well-implemented; I
> haven't looked.


My concern about that is list partitioning.  In that case that
function calls partition_list_bounds_merge(), which generates the
partition bounds for a join relation between two given input
relations, by performing merge join for a pair of the datums arrays
from both the input relations.  Since the datums arrays contain all
non-null list values across all partitions, if the numbers of the list
values (ie, ndatums') are large, the merge join would require not a
few cycles, so it would be much expensive to perform the merge join
for each such pair when considering large N-way partitionwise joins of
list-partitioned tables with large ndatums.  To see that, I did simple
tests using a list-partitioned table pt created with the attached,
which has 10 partitions, each with 1000 list values, so ndatums is
10000.  (The tests below are performed with
enable_partitionwise_join=on.)

* 2-way self-join of pt: explain analyze select * from pt t0, pt t1
where t0.a = t1.a;
 - HEAD:
 Planning Time: 1.731 ms
 Execution Time: 15.159 ms
 - Patched:
 Planning Time: 1.884 ms
 Execution Time: 15.127 ms

* 4-way self-join of pt: explain analyze select * from pt t0, pt t1,
pt t2, pt t3 where t0.a = t1.a and t1.a = t2.a and t2.a = t3.a;
 - HEAD:
 Planning Time: 28.787 ms
 Execution Time: 34.313 ms
 - Patched:
 Planning Time: 40.263 ms
 Execution Time: 35.019 ms

* 8-way self-join of pt: explain analyze select * from pt t0, pt t1,
pt t2, pt t3, pt t4, pt t5, pt t6, pt t7 where t0.a = t1.a and t1.a =
t2.a and t2.a = t3.a and t3.a = t4.a and t4.a = t5.a and t5.a = t6.a
and t6.a = t7.a;
 - HEAD:
 Planning Time: 2279.653 ms
 Execution Time: 63.303 ms
 - Patched:
 Planning Time: 3834.751 ms
 Execution Time: 62.949 ms

Actually, these joins would not need the partition-matching algorithm
the patch adds; we could probably avoid this regression by modifying
the patch to plan these joins the same way as before, but ISTM that
these results imply that the cost of performing the merge join for
each such pair would not be negligible when considering large N-way
partitionwise joins mentioned above.  Maybe I'm missing something,
though.

> But in general I don't see an alternative to doing
> some kind of merging on each pair of input relations. That's just how
> planning works, and I don't see why it should need to be prohibitively
> expensive.  I might be missing something, though; do you have an idea?

Yes, I do; but I think I should think a little more about that.

Sorry for the delay.

Best regards,
Etsuro Fujita

list_parted2.sql
Description: Binary data

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

Reply via email to