On Thu, 21 Nov 2019, 13:52 Masahiko Sawada, <masahiko.saw...@2ndquadrant.com> wrote:
> On Thu, 21 Nov 2019 at 14:16, Dilip Kumar <dilipbal...@gmail.com> wrote: > > > > On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada > > <masahiko.saw...@2ndquadrant.com> wrote: > > > > > > On Mon, 18 Nov 2019 at 15:38, Masahiko Sawada > > > <masahiko.saw...@2ndquadrant.com> wrote: > > > > > > > > On Mon, 18 Nov 2019 at 15:34, Amit Kapila <amit.kapil...@gmail.com> > wrote: > > > > > > > > > > On Mon, Nov 18, 2019 at 11:37 AM Masahiko Sawada > > > > > <masahiko.saw...@2ndquadrant.com> wrote: > > > > > > > > > > > > On Wed, 13 Nov 2019 at 14:31, Amit Kapila < > amit.kapil...@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > > Based on these needs, we came up with a way to allow users to > specify > > > > > > > this information for IndexAm's. Basically, Indexam will expose > a > > > > > > > variable amparallelvacuumoptions which can have below options > > > > > > > > > > > > > > VACUUM_OPTION_NO_PARALLEL 1 << 0 # vacuum (neither > bulkdelete nor > > > > > > > vacuumcleanup) can't be performed in parallel > > > > > > > > > > > > I think VACUUM_OPTION_NO_PARALLEL can be 0 so that index AMs who > don't > > > > > > want to support parallel vacuum don't have to set anything. > > > > > > > > > > > > > > > > make sense. > > > > > > > > > > > > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1 # bulkdelete can be > done in > > > > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will > set this > > > > > > > flag) > > > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 2 # vacuumcleanup > can be > > > > > > > done in parallel if bulkdelete is not performed (Indexes > nbtree, brin, > > > > > > > gin, gist, > > > > > > > spgist, bloom will set this flag) > > > > > > > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 3 # vacuumcleanup can be > done in > > > > > > > parallel even if bulkdelete is already performed (Indexes gin, > brin, > > > > > > > and bloom will set this flag) > > > > > > > > > > > > I think gin and bloom don't need to set both but should set only > > > > > > VACUUM_OPTION_PARALLEL_CLEANUP. > > > > > > > > > > > > And I'm going to disallow index AMs to set both > > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP and > VACUUM_OPTION_PARALLEL_CLEANUP > > > > > > by assertions, is that okay? > > > > > > > > > > > > > > > > Sounds reasonable to me. > > > > > > > > > > Are you planning to include the changes related to I/O throttling > > > > > based on the discussion in the nearby thread [1]? I think you can > do > > > > > that if you agree with the conclusion in the last email[1], > otherwise, > > > > > we can explore it separately. > > > > > > > > Yes I agreed. I'm going to include that changes in the next version > > > > patches. And I think we will be able to do more discussion based on > > > > the patch. > > > > > > > > > > I've attached the latest version patch set. The patch set includes all > > > discussed points regarding index AM options as well as shared cost > > > balance. Also I added some test cases used all types of index AM. > > > > > > During developments I had one concern about the number of parallel > > > workers to launch. In current design each index AMs can choose the > > > participation of parallel bulk-deletion and parallel cleanup. That > > > also means the number of parallel worker to launch might be different > > > for each time of parallel bulk-deletion and parallel cleanup. In > > > current patch the leader will always launch the number of indexes that > > > support either one but it would not be efficient in some cases. For > > > example, if we have 3 indexes supporting only parallel bulk-deletion > > > and 2 indexes supporting only parallel index cleanup, we would launch > > > 5 workers for each execution but some workers will do nothing at all. > > > To deal with this problem, I wonder if we can improve the parallel > > > query so that the leader process creates a parallel context with the > > > maximum number of indexes and can launch a part of workers instead of > > > all of them. > > > > > + > > + /* compute new balance by adding the local value */ > > + shared_balance = pg_atomic_read_u32(VacuumSharedCostBalance); > > + new_balance = shared_balance + VacuumCostBalance; > > > > + /* also compute the total local balance */ > > + local_balance = VacuumCostBalanceLocal + VacuumCostBalance; > > + > > + if ((new_balance >= VacuumCostLimit) && > > + (local_balance > 0.5 * (VacuumCostLimit / nworkers))) > > + { > > + /* compute sleep time based on the local cost balance */ > > + msec = VacuumCostDelay * VacuumCostBalanceLocal / VacuumCostLimit; > > + new_balance = shared_balance - VacuumCostBalanceLocal; > > + VacuumCostBalanceLocal = 0; > > + } > > + > > + if (pg_atomic_compare_exchange_u32(VacuumSharedCostBalance, > > + &shared_balance, > > + new_balance)) > > + { > > + /* Updated successfully, break */ > > + break; > > + } > > While looking at the shared costing delay part, I have noticed that > > while checking the delay condition, we are considering local_balance > > which is VacuumCostBalanceLocal + VacuumCostBalance, but while > > computing the new balance we only reduce shared balance by > > VacuumCostBalanceLocal, I think it should be reduced with > > local_balance? > > Right. > > > I see that later we are adding VacuumCostBalance to > > the VacuumCostBalanceLocal so we are not loosing accounting for this > > balance. But, I feel it is not right that we compare based on one > > value and operate based on other. I think we can immediately set > > VacuumCostBalanceLocal += VacuumCostBalance before checking the > > condition. > > I think we should not do VacuumCostBalanceLocal += VacuumCostBalance > inside the while loop because it's repeatedly executed until CAS > operation succeeds. Instead we can move it before the loop and remove > local_balance? Right, I meant before loop. > The code would be like the following: > > if (VacuumSharedCostBalance != NULL) > { > : > VacuumCostBalanceLocal += VacuumCostBalance; > : > /* Update the shared cost balance value atomically */ > while (true) > { > uint32 shared_balance; > uint32 new_balance; > > msec = 0; > > /* compute new balance by adding the local value */ > shared_balance = pg_atomic_read_u32(VacuumSharedCostBalance); > new_balance = shared_balance + VacuumCostBalance; > > if ((new_balance >= VacuumCostLimit) && > (VacuumCostBalanceLocal > 0.5 * (VacuumCostLimit / nworkers))) > { > /* compute sleep time based on the local cost balance */ > msec = VacuumCostDelay * VacuumCostBalanceLocal / > VacuumCostLimit; > new_balance = shared_balance - VacuumCostBalanceLocal; > VacuumCostBalanceLocal = 0; > } > > if (pg_atomic_compare_exchange_u32(VacuumSharedCostBalance, > &shared_balance, > new_balance)) > { > /* Updated successfully, break */ > break; > } > } > > : > VacuumCostBalance = 0; > } > > Thoughts? > Looks fine to me. > > Regards, > > -- > Masahiko Sawada http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >