Hi All, I did some performance testing with the help of Dilip to test normal vacuum and parallel vacuum. Below is the test summary-
*Configuration settings:* autovacuum = off shared_buffers = 2GB max_parallel_maintenance_workers = 6 *Test 1: (*table has 4 index on all tuples and deleting alternative tuples) *create table test(a int, b int, c int, d int, e int, f int, g int, h int); create index i1 on test (a); create index i2 on test (b); create index i3 on test (c); create index i4 on test (d); insert into test select i,i,i,i,i,i,i,i from generate_series(1,1000000) as i; delete from test where a %2=0;* *case 1: (run normal vacuum)* *vacuum test;* *1019.453 ms* *Case 2: (run vacuum with 1 parallel degree)* *vacuum (parallel 1) test;* *765.366 ms* *Case 3:(run vacuum with 3 parallel degree)* *vacuum (parallel 3) test;* *555.227 ms* *From above results, we can concluded that with the help of parallel vacuum, performance is increased for large indexes.* *Test 2:(table has 16 indexes and indexes are small , deleting alternative tuples)* *create table test(a int, b int, c int, d int, e int, f int, g int, h int);create index i1 on test (a) where a < 100000;create index i2 on test (a) where a > 100000 and a < 200000;create index i3 on test (a) where a > 200000 and a < 300000;create index i4 on test (a) where a > 300000 and a < 400000;create index i5 on test (a) where a > 400000 and a < 500000;create index i6 on test (a) where a > 500000 and a < 600000;create index i7 on test (b) where a < 100000;create index i8 on test (c) where a < 100000;create index i9 on test (d) where a < 100000;create index i10 on test (d) where a < 100000;create index i11 on test (d) where a < 100000;create index i12 on test (d) where a < 100000;create index i13 on test (d) where a < 100000;create index i14 on test (d) where a < 100000;create index i15 on test (d) where a < 100000;create index i16 on test (d) where a < 100000;insert into test select i,i,i,i,i,i,i,i from generate_series(1,1000000) as i;delete from test where a %2=0;* *case 1: (run normal vacuum)* *vacuum test;* *649.187 ms* *Case 2: (run vacuum with 1 parallel degree)* *vacuum (parallel 1) test;* *492.075 ms* *Case 3:(run vacuum with 3 parallel degree)* *vacuum (parallel 3) test;* *435.581 ms* *For small indexes also, we gained some performance by parallel vacuum.* *I will continue my testing for stats collection.* *Please let me know, if anybody has any suggestion for other testing(What should be tested).* *Thanks and Regards* *Mahendra Thalor* *EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com>* On Tue, 29 Oct 2019 at 12:37, Masahiko Sawada <sawada.m...@gmail.com> wrote: > On Mon, Oct 28, 2019 at 2:13 PM Dilip Kumar <dilipbal...@gmail.com> wrote: > > > > On Thu, Oct 24, 2019 at 4:33 PM Dilip Kumar <dilipbal...@gmail.com> > wrote: > > > > > > On Thu, Oct 24, 2019 at 4:21 PM Amit Kapila <amit.kapil...@gmail.com> > wrote: > > > > > > > > On Thu, Oct 24, 2019 at 11:51 AM Dilip Kumar <dilipbal...@gmail.com> > wrote: > > > > > > > > > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar < > dilipbal...@gmail.com> wrote: > > > > > > > > > > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila < > amit.kapil...@gmail.com> wrote: > > > > > > > > > > > > > > I am thinking if we can write the patch for both the > approaches (a. > > > > > > > compute shared costs and try to delay based on that, b. try to > divide > > > > > > > the I/O cost among workers as described in the email above[1]) > and do > > > > > > > some tests to see the behavior of throttling, that might help > us in > > > > > > > deciding what is the best strategy to solve this problem, if > any. > > > > > > > What do you think? > > > > > > > > > > > > I agree with this idea. I can come up with a POC patch for > approach > > > > > > (b). Meanwhile, if someone is interested to quickly hack with > the > > > > > > approach (a) then we can do some testing and compare. > Sawada-san, > > > > > > by any chance will you be interested to write POC with approach > (a)? > > > > > > Otherwise, I will try to write it after finishing the first one > > > > > > (approach b). > > > > > > > > > > > I have come up with the POC for approach (a). > > > > > > Can we compute the overall throttling (sleep time) in the operation > > > > separately for heap and index, then divide the index's sleep_time > with > > > > a number of workers and add it to heap's sleep time? Then, it will > be > > > > a bit easier to compare the data between parallel and non-parallel > > > > case. > > I have come up with a patch to compute the total delay during the > > vacuum. So the idea of computing the total cost delay is > > > > Total cost delay = Total dealy of heap scan + Total dealy of > > index/worker; Patch is attached for the same. > > > > I have prepared this patch on the latest patch of the parallel > > vacuum[1]. I have also rebased the patch for the approach [b] for > > dividing the vacuum cost limit and done some testing for computing the > > I/O throttling. Attached patches 0001-POC-compute-total-cost-delay > > and 0002-POC-divide-vacuum-cost-limit can be applied on top of > > v31-0005-Add-paralell-P-option-to-vacuumdb-command.patch. I haven't > > rebased on top of v31-0006, because v31-0006 is implementing the I/O > > throttling with one approach and 0002-POC-divide-vacuum-cost-limit is > > doing the same with another approach. But, > > 0001-POC-compute-total-cost-delay can be applied on top of v31-0006 as > > well (just 1-2 lines conflict). > > > > Testing: I have performed 2 tests, one with the same size indexes and > > second with the different size indexes and measured total I/O delay > > with the attached patch. > > > > Setup: > > VacuumCostDelay=10ms > > VacuumCostLimit=2000 > > > > Test1 (Same size index): > > create table test(a int, b varchar, c varchar); > > create index idx1 on test(a); > > create index idx2 on test(b); > > create index idx3 on test(c); > > insert into test select i, repeat('a',30)||i, repeat('a',20)||i from > > generate_series(1,500000) as i; > > delete from test where a < 200000; > > > > Vacuum (Head) Parallel Vacuum > > Vacuum Cost Divide Patch > > Total Delay 1784 (ms) 1398(ms) > > 1938(ms) > > > > > > Test2 (Variable size dead tuple in index) > > create table test(a int, b varchar, c varchar); > > create index idx1 on test(a); > > create index idx2 on test(b) where a > 100000; > > create index idx3 on test(c) where a > 150000; > > > > insert into test select i, repeat('a',30)||i, repeat('a',20)||i from > > generate_series(1,500000) as i; > > delete from test where a < 200000; > > > > Vacuum (Head) Parallel Vacuum > > Vacuum Cost Divide Patch > > Total Delay 1438 (ms) 1029(ms) > > 1529(ms) > > > > > > Conclusion: > > 1. The tests prove that the total I/O delay is significantly less with > > the parallel vacuum. > > 2. With the vacuum cost divide the problem is solved but the delay bit > > more compared to the non-parallel version. The reason could be the > > problem discussed at[2], but it needs further investigation. > > > > Next, I will test with the v31-0006 (shared vacuum cost) patch. I > > will also try to test different types of indexes. > > > > Thank you for testing! > > I realized that v31-0006 patch doesn't work fine so I've attached the > updated version patch that also incorporated some comments I got so > far. Sorry for the inconvenience. I'll apply your 0001 patch and also > test the total delay time. > > Regards, > > -- > Masahiko Sawada >