On Sun, Mar 9, 2025 at 11:28 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Fri, Mar 7, 2025 at 11:06 PM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > > > Discussing with Amit offlist, I've run another benchmark test where no > > data is loaded on the shared buffer. In the previous test, I loaded > > all table blocks before running vacuum, so it was the best case. The > > attached test results showed the worst case. > > > > Overall, while the numbers seem not stable, the phase I got sped up a > > bit, but not as scalable as expected, which is not surprising. > > > > Sorry, but it is difficult for me to understand this data because it > doesn't contain the schema or details like what exactly is a fraction. > It is also not clear how the workers are divided among heap and > indexes, like do we use parallelism for both phases of heap or only > first phase and do we reuse those workers for index vacuuming. These > tests were probably discussed earlier, but it would be better to > either add a summary of the required information to understand the > results or at least a link to a previous email that has such details.
The testing configurations are: max_wal_size = 50GB shared_buffers = 25GB max_parallel_maintenance_workers = 10 max_parallel_workers = 20 max_worker_processes = 30 The test scripts are: ($m and $p are a fraction and a parallel degree, respectively) create unlogged table test_vacuum (a bigint) with (autovacuum_enabled=off); insert into test_vacuum select i from generate_series(1,200000000) s(i); create index idx_0 on test_vacuum (a); create index idx_1 on test_vacuum (a); create index idx_2 on test_vacuum (a); create index idx_3 on test_vacuum (a); create index idx_4 on test_vacuum (a); delete from test_vacuum where mod(a, $m) = 0; vacuum (verbose, parallel $p) test_vacuum; -- measured the execution time > > Please > > note that the test results shows that the phase III also got sped up > > but this is because in parallel vacuum we use more ring buffers than > > the single process vacuum. So we need to compare the only phase I time > > in terms of the benefit of the parallelism. > > > > Does phase 3 also use parallelism? If so, can we try to divide the > ring buffers among workers or at least try vacuum with an increased > number of ring buffers. This would be good to do for both the phases, > if they both use parallelism. No, only phase 1 was parallelized in this test. In parallel vacuum, since it uses (ring_buffer_size * parallel_degree) memory, more pages are loaded during phase 1, increasing cache hits during phase 3. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com