Re: [HACKERS] Block level parallel vacuum

Dilip Kumar Thu, 21 Nov 2019 01:16:13 -0800

On Thu, 21 Nov 2019, 13:52 Masahiko Sawada, <[email protected]>
wrote:


> On Thu, 21 Nov 2019 at 14:16, Dilip Kumar <[email protected]> wrote:
> >
> > On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada
> > <[email protected]> wrote:
> > >
> > > On Mon, 18 Nov 2019 at 15:38, Masahiko Sawada
> > > <[email protected]> wrote:
> > > >
> > > > On Mon, 18 Nov 2019 at 15:34, Amit Kapila <[email protected]>
> wrote:
> > > > >
> > > > > On Mon, Nov 18, 2019 at 11:37 AM Masahiko Sawada
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > On Wed, 13 Nov 2019 at 14:31, Amit Kapila <
> [email protected]> wrote:
> > > > > > >
> > > > > > >
> > > > > > > Based on these needs, we came up with a way to allow users to
> specify
> > > > > > > this information for IndexAm's. Basically, Indexam will expose
> a
> > > > > > > variable amparallelvacuumoptions which can have below options
> > > > > > >
> > > > > > > VACUUM_OPTION_NO_PARALLEL   1 << 0 # vacuum (neither
> bulkdelete nor
> > > > > > > vacuumcleanup) can't be performed in parallel
> > > > > >
> > > > > > I think VACUUM_OPTION_NO_PARALLEL can be 0 so that index AMs who
> don't
> > > > > > want to support parallel vacuum don't have to set anything.
> > > > > >
> > > > >
> > > > > make sense.
> > > > >
> > > > > > > VACUUM_OPTION_PARALLEL_BULKDEL   1 << 1 # bulkdelete can be
> done in
> > > > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will
> set this
> > > > > > > flag)
> > > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP  1 << 2 # vacuumcleanup
> can be
> > > > > > > done in parallel if bulkdelete is not performed (Indexes
> nbtree, brin,
> > > > > > > gin, gist,
> > > > > > > spgist, bloom will set this flag)
> > > > > > > VACUUM_OPTION_PARALLEL_CLEANUP  1 << 3 # vacuumcleanup can be
> done in
> > > > > > > parallel even if bulkdelete is already performed (Indexes gin,
> brin,
> > > > > > > and bloom will set this flag)
> > > > > >
> > > > > > I think gin and bloom don't need to set both but should set only
> > > > > > VACUUM_OPTION_PARALLEL_CLEANUP.
> > > > > >
> > > > > > And I'm going to disallow index AMs to set both
> > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP and
> VACUUM_OPTION_PARALLEL_CLEANUP
> > > > > > by assertions, is that okay?
> > > > > >
> > > > >
> > > > > Sounds reasonable to me.
> > > > >
> > > > > Are you planning to include the changes related to I/O throttling
> > > > > based on the discussion in the nearby thread [1]?  I think you can
> do
> > > > > that if you agree with the conclusion in the last email[1],
> otherwise,
> > > > > we can explore it separately.
> > > >
> > > > Yes I agreed. I'm going to include that changes in the next version
> > > > patches. And I think we will be able to do more discussion based on
> > > > the patch.
> > > >
> > >
> > > I've attached the latest version patch set. The patch set includes all
> > > discussed points regarding index AM options as well as shared cost
> > > balance. Also I added some test cases used all types of index AM.
> > >
> > > During developments I had one concern about the number of parallel
> > > workers to launch. In current design each index AMs can choose the
> > > participation of parallel bulk-deletion and parallel cleanup. That
> > > also means the number of parallel worker to launch might be different
> > > for each time of parallel bulk-deletion and parallel cleanup. In
> > > current patch the leader will always launch the number of indexes that
> > > support either one but it would not be efficient in some cases. For
> > > example, if we have 3 indexes supporting only parallel bulk-deletion
> > > and 2 indexes supporting only parallel index cleanup, we would launch
> > > 5 workers for each execution but some workers will do nothing at all.
> > > To deal with this problem, I wonder if we can improve the parallel
> > > query so that the leader process creates a parallel context with the
> > > maximum number of indexes and can launch a part of workers instead of
> > > all of them.
> > >
> > +
> > + /* compute new balance by adding the local value */
> > + shared_balance = pg_atomic_read_u32(VacuumSharedCostBalance);
> > + new_balance = shared_balance + VacuumCostBalance;
> >
> > + /* also compute the total local balance */
> > + local_balance = VacuumCostBalanceLocal + VacuumCostBalance;
> > +
> > + if ((new_balance >= VacuumCostLimit) &&
> > + (local_balance > 0.5 * (VacuumCostLimit / nworkers)))
> > + {
> > + /* compute sleep time based on the local cost balance */
> > + msec = VacuumCostDelay * VacuumCostBalanceLocal / VacuumCostLimit;
> > + new_balance = shared_balance - VacuumCostBalanceLocal;
> > + VacuumCostBalanceLocal = 0;
> > + }
> > +
> > + if (pg_atomic_compare_exchange_u32(VacuumSharedCostBalance,
> > +    &shared_balance,
> > +    new_balance))
> > + {
> > + /* Updated successfully, break */
> > + break;
> > + }
> > While looking at the shared costing delay part, I have noticed that
> > while checking the delay condition, we are considering local_balance
> > which is VacuumCostBalanceLocal + VacuumCostBalance, but while
> > computing the new balance we only reduce shared balance by
> > VacuumCostBalanceLocal,  I think it should be reduced with
> > local_balance?
>
>  Right.
>
> > I see that later we are adding VacuumCostBalance to
> > the VacuumCostBalanceLocal so we are not loosing accounting for this
> > balance.  But, I feel it is not right that we compare based on one
> > value and operate based on other. I think we can immediately set
> > VacuumCostBalanceLocal += VacuumCostBalance before checking the
> > condition.
>
> I think we should not do VacuumCostBalanceLocal += VacuumCostBalance
> inside the while loop because it's repeatedly executed until CAS
> operation succeeds. Instead we can move it before the loop and remove
> local_balance?


Right, I meant before loop.

> The code would be like the following:
>
> if (VacuumSharedCostBalance != NULL)
> {
>   :
>   VacuumCostBalanceLocal += VacuumCostBalance;
>   :
>   /* Update the shared cost balance value atomically */
>   while (true)
>   {
>       uint32 shared_balance;
>       uint32 new_balance;
>
>       msec = 0;
>
>       /* compute new balance by adding the local value */
>       shared_balance = pg_atomic_read_u32(VacuumSharedCostBalance);
>       new_balance = shared_balance + VacuumCostBalance;
>
>       if ((new_balance >= VacuumCostLimit) &&
>           (VacuumCostBalanceLocal > 0.5 * (VacuumCostLimit / nworkers)))
>       {
>           /* compute sleep time based on the local cost balance */
>           msec = VacuumCostDelay * VacuumCostBalanceLocal /
> VacuumCostLimit;
>           new_balance = shared_balance - VacuumCostBalanceLocal;
>           VacuumCostBalanceLocal = 0;
>       }
>
>       if (pg_atomic_compare_exchange_u32(VacuumSharedCostBalance,
>                                          &shared_balance,
>                                          new_balance))
>       {
>           /* Updated successfully, break */
>           break;
>       }
>   }
>
>    :
>  VacuumCostBalance = 0;
> }
>
> Thoughts?
>

Looks fine to me.

>
> Regards,
>
> --
> Masahiko Sawada            http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Re: [HACKERS] Block level parallel vacuum

Reply via email to