Re: [PoC] Reducing planning time when tables have many partitions

Yuya Watari Sun, 23 Oct 2022 21:13:44 -0700

Hello,

On Wed, Sep 21, 2022 at 6:43 PM Yuya Watari <watari.y...@gmail.com> wrote:
> 1.1. Revert one of our optimizations (v5)
>
> As I mentioned in the comment in
> v[5|6|7]-0002-Revert-one-of-the-optimizations.patch, I reverted one of
> our optimizations. This code tries to find EquivalenceMembers that do
> not satisfy the bms_overlap condition. We encounter such members early
> in the loop, so the linear search is enough, and our optimization is
> too excessive here. As a result of experiments, I found this
> optimization was a bottleneck, so I reverted it.


In the previous mail, I proposed a revert of one excessive
optimization. In addition, I found a new bottleneck and attached a new
version of the patch solving it to this email.

The new bottleneck exists in the select_outer_pathkeys_for_merge()
function. At the end of this function, we count EquivalenceMembers
that satisfy the specific condition. To count them, we have used
Bitmapset operations. Through experiments, I concluded that this
optimization is effective for larger cases but leads to some
degradation for the smaller number of partitions. The new patch
switches two algorithms depending on the problem sizes.

1. Experimental result

1.1. Join Order Benchmark

As in the previous email, I used the Join Order Benchmark to evaluate
the patches' performance. The correspondence between each version and
patches is as follows.

v3: v8-0001-*.patch
v5 (v3 with revert): v8-0001-*.patch + v8-0002-*.patch
v8 (v5 with revert): v8-0001-*.patch + v8-0002-*.patch + v8-0003-*.patch

I show the speed-up of each method compared with the master branch in
Table 1. When the number of partitions is 1, performance degradation
is kept to 1.1% in v8, while they are 4.2% and 1.8% in v3 and v5. This
result indicates that a newly introduced revert is effective.

Table 1: Speedup of Join Order Benchmark (higher is better)
(n = the number of partitions)
----------------------------------------------------------
   n |     v3 | v5 (v3 with revert) | v8 (v5 with revert)
----------------------------------------------------------
   2 |  95.8% |               98.2% |               98.9%
   4 |  97.2% |               99.7% |               99.3%
   8 | 101.4% |              102.5% |              103.4%
  16 | 108.7% |              111.4% |              110.2%
  32 | 127.1% |              127.6% |              128.8%
  64 | 169.5% |              172.1% |              172.4%
 128 | 330.1% |              335.2% |              332.3%
 256 | 815.1% |              826.4% |              821.8%
----------------------------------------------------------

1.2. pgbench

The following table describes the result of pgbench. The v5 and v8
performed clearly better than the v3 patch. The difference between v5
and v8 is not so significant, but v8's performance is close to the
master branch.

Table 2: The result of pgbench (tps)
-----------------------------------------------------------------
   n | Master |   v3 | v5 (v3 with revert) | v8 (v5 with revert)
-----------------------------------------------------------------
   1 |   7550 | 7422 |                7474 |                7521
   2 |   7594 | 7381 |                7536 |                7529
   4 |   7518 | 7362 |                7461 |                7524
   8 |   7459 | 7340 |                7424 |                7460
-----------------------------------------------------------------
 Avg |   7531 | 7377 |                7474 |                7509
-----------------------------------------------------------------

2. Conclusion and future works

The revert in the v8-0003-*.patch is effective in preventing
performance degradation for the smaller number of partitions. However,
I don't think what I have done in the patch is the best or ideal
solution. As I mentioned in the comments in the patch, switching two
algorithms may be ugly because it introduces code duplication. We need
a wiser solution to this problem.

-- 
Best regards,
Yuya Watari

v8-0001-Apply-eclass_member_speedup_v3.patch.patch
Description: Binary data

v8-0002-Revert-one-of-the-optimizations.patch
Description: Binary data

v8-0003-Use-conventional-algorithm-for-smaller-size-cases.patch
Description: Binary data

Re: [PoC] Reducing planning time when tables have many partitions

Reply via email to