On Thu, Feb 8, 2024 at 6:33 PM Hayato Kuroda (Fujitsu) <kuroda.hay...@fujitsu.com> wrote: > > Dear Sawada-san, > > Thanks for making v3 patchset. I have also benchmarked the case [1]. > Below results are the average of 5th, there are almost the same result > even when median is used for the comparison. On my env, the regression > cannot be seen. > > HEAD (1e285a5) HEAD + v3 patches difference > 10910.722 ms 10714.540 ms around 1.8% >
Thank you for doing the performance test! > Also, here are mino comments for v3 set. > > 01. > bh_nodeidx_entry and ReorderBufferMemTrackState is missing in typedefs.list. Will add them. > > 02. ReorderBufferTXNSizeCompare > Should we assert {ta, tb} are not NULL? Not sure we really need it as other binaryheap users don't have such checks. On Tue, Feb 6, 2024 at 2:45 PM Hayato Kuroda (Fujitsu) <kuroda.hay...@fujitsu.com> wrote: > > > I've run a benchmark test that I shared before[1]. Here are results of > > decoding a transaction that has 1M subtransaction each of which has 1 > > INSERT: > > > > HEAD: > > 1810.192 ms > > > > HEAD w/ patch: > > 2001.094 ms > > > > I set a large enough value to logical_decoding_work_mem not to evict > > any transactions. I can see about about 10% performance regression in > > this case. > > Thanks for running. I think this workload is the worst and an extreme case > which > would not be occurred on the real system (Such a system should be fixed), so > we > can say that the regression is up to -10%. I felt it could be negligible but > how > do other think? I think this performance regression is not acceptable. In this workload, one transaction has 10k subtransactions and the logical decoding becomes quite slow if logical_decoding_work_mem is not big enough. Therefore, it's a legitimate and common approach to increase logical_decoding_work_mem to speedup the decoding. However, with thie patch, the decoding becomes slower than today. It's a bad idea in general to optimize an extreme case while sacrificing the normal (or more common) cases. Therefore, I've improved the algorithm so that we don't touch the max-heap at all if the number of transactions is small enough. I've run benchmark test with two workloads: workload-1, decode single transaction with 800k tuples (normal.sql): * without spill HEAD: 13235.136 ms v3 patch: 14320.082 ms v4 patch: 13300.665 ms * with spill HEAD: 22970.204 ms v3 patch: 23625.649 ms v4 patch: 23304.366 workload-2, decode one transaction with 100k subtransaction (many-subtxn.sql): * without spill HEAD: 345.718 ms v3 patch: 409.686 ms v4 patch: 353.026 ms * with spill HEAD: 136718.313 ms v3 patch: 2675.539 ms v4 patch: 2734.981 ms Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
v4-0001-Make-binaryheap-enlareable.patch
Description: Binary data
v4-0002-Add-functions-to-binaryheap-to-efficiently-remove.patch
Description: Binary data
v4-0003-Use-max-heap-to-evict-largest-transactions-in-Reo.patch
Description: Binary data
normal.sql
Description: Binary data
many-subtxn.sql
Description: Binary data