Hello, On Wed, Sep 21, 2022 at 6:43 PM Yuya Watari <watari.y...@gmail.com> wrote: > 1.1. Revert one of our optimizations (v5) > > As I mentioned in the comment in > v[5|6|7]-0002-Revert-one-of-the-optimizations.patch, I reverted one of > our optimizations. This code tries to find EquivalenceMembers that do > not satisfy the bms_overlap condition. We encounter such members early > in the loop, so the linear search is enough, and our optimization is > too excessive here. As a result of experiments, I found this > optimization was a bottleneck, so I reverted it.
In the previous mail, I proposed a revert of one excessive optimization. In addition, I found a new bottleneck and attached a new version of the patch solving it to this email. The new bottleneck exists in the select_outer_pathkeys_for_merge() function. At the end of this function, we count EquivalenceMembers that satisfy the specific condition. To count them, we have used Bitmapset operations. Through experiments, I concluded that this optimization is effective for larger cases but leads to some degradation for the smaller number of partitions. The new patch switches two algorithms depending on the problem sizes. 1. Experimental result 1.1. Join Order Benchmark As in the previous email, I used the Join Order Benchmark to evaluate the patches' performance. The correspondence between each version and patches is as follows. v3: v8-0001-*.patch v5 (v3 with revert): v8-0001-*.patch + v8-0002-*.patch v8 (v5 with revert): v8-0001-*.patch + v8-0002-*.patch + v8-0003-*.patch I show the speed-up of each method compared with the master branch in Table 1. When the number of partitions is 1, performance degradation is kept to 1.1% in v8, while they are 4.2% and 1.8% in v3 and v5. This result indicates that a newly introduced revert is effective. Table 1: Speedup of Join Order Benchmark (higher is better) (n = the number of partitions) ---------------------------------------------------------- n | v3 | v5 (v3 with revert) | v8 (v5 with revert) ---------------------------------------------------------- 2 | 95.8% | 98.2% | 98.9% 4 | 97.2% | 99.7% | 99.3% 8 | 101.4% | 102.5% | 103.4% 16 | 108.7% | 111.4% | 110.2% 32 | 127.1% | 127.6% | 128.8% 64 | 169.5% | 172.1% | 172.4% 128 | 330.1% | 335.2% | 332.3% 256 | 815.1% | 826.4% | 821.8% ---------------------------------------------------------- 1.2. pgbench The following table describes the result of pgbench. The v5 and v8 performed clearly better than the v3 patch. The difference between v5 and v8 is not so significant, but v8's performance is close to the master branch. Table 2: The result of pgbench (tps) ----------------------------------------------------------------- n | Master | v3 | v5 (v3 with revert) | v8 (v5 with revert) ----------------------------------------------------------------- 1 | 7550 | 7422 | 7474 | 7521 2 | 7594 | 7381 | 7536 | 7529 4 | 7518 | 7362 | 7461 | 7524 8 | 7459 | 7340 | 7424 | 7460 ----------------------------------------------------------------- Avg | 7531 | 7377 | 7474 | 7509 ----------------------------------------------------------------- 2. Conclusion and future works The revert in the v8-0003-*.patch is effective in preventing performance degradation for the smaller number of partitions. However, I don't think what I have done in the patch is the best or ideal solution. As I mentioned in the comments in the patch, switching two algorithms may be ugly because it introduces code duplication. We need a wiser solution to this problem. -- Best regards, Yuya Watari
v8-0001-Apply-eclass_member_speedup_v3.patch.patch
Description: Binary data
v8-0002-Revert-one-of-the-optimizations.patch
Description: Binary data
v8-0003-Use-conventional-algorithm-for-smaller-size-cases.patch
Description: Binary data