On Thu, Sep 19, 2024 at 10:44 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Thu, Sep 19, 2024 at 10:33 PM Masahiko Sawada <sawada.m...@gmail.com> > wrote: > > > > On Wed, Sep 18, 2024 at 8:55 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > > > On Thu, Sep 19, 2024 at 6:46 AM David Rowley <dgrowle...@gmail.com> wrote: > > > > > > > > On Thu, 19 Sept 2024 at 11:54, Masahiko Sawada <sawada.m...@gmail.com> > > > > wrote: > > > > > I've done some benchmark tests for three different code bases with > > > > > different test cases. In short, reducing the generation memory context > > > > > block size to 8kB seems to be promising; it mitigates the problem > > > > > while keeping a similar performance. > > > > > > > > Did you try any sizes between 8KB and 8MB? 1000x reduction seems > > > > quite large a jump. There is additional overhead from having more > > > > blocks. It means more malloc() work and more free() work when deleting > > > > a context. It would be nice to see some numbers with all powers of 2 > > > > between 8KB and 8MB. I imagine the returns are diminishing as the > > > > block size is reduced further. > > > > > > > > > > Good idea. > > > > Agreed. > > > > I've done other benchmarking tests while changing the memory block > > sizes from 8kB to 8MB. I measured the execution time of logical > > decoding of one transaction that inserted 10M rows. I set > > logical_decoding_work_mem large enough to avoid spilling behavior. In > > this scenario, we allocate many memory chunks while decoding the > > transaction and resulting in calling more malloc() in smaller memory > > block sizes. Here are results (an average of 3 executions): > > > > 8kB: 19747.870 ms > > 16kB: 19780.025 ms > > 32kB: 19760.575 ms > > 64kB: 19772.387 ms > > 128kB: 19825.385 ms > > 256kB: 19781.118 ms > > 512kB: 19808.138 ms > > 1MB: 19757.640 ms > > 2MB: 19801.429 ms > > 4MB: 19673.996 ms > > 8MB: 19643.547 ms > > > > Interestingly, there were no noticeable differences in the execution > > time. I've checked the number of allocated memory blocks in each case > > and more blocks are allocated in smaller block size cases. For > > example, when the logical decoding used the maximum memory (about > > 1.5GB), we allocated about 80k blocks in 8kb memory block size case > > and 80 blocks in 8MB memory block cases. > > > > What exactly do these test results mean? Do you want to prove that > there is no regression by using smaller block sizes?
Yes, there was no noticeable performance regression at least in this test scenario. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com