Re: Parallel heap vacuum

Masahiko Sawada Mon, 11 Nov 2024 10:16:09 -0800

On Mon, Nov 11, 2024 at 5:08 AM Hayato Kuroda (Fujitsu)
<kuroda.hay...@fujitsu.com> wrote:
>
> Dear Sawda-san,
>
> >
> > I've attached new version patches that fixes failures reported by
> > cfbot. I hope these changes make cfbot happy.
>
> Thanks for updating the patch and sorry for delaying the reply. I confirmed 
> cfbot
> for Linux/Windows said ok.
> I'm still learning the feature so I can post only one comment :-(.
>
> I wanted to know whether TidStoreBeginIterateShared() was needed. IIUC, 
> pre-existing API,
> TidStoreBeginIterate(), has already accepted the shared TidStore. The only 
> difference
> is whether elog(ERROR) exists, but I wonder if it benefits others. Is there 
> another
> reason that lazy_vacuum_heap_rel() uses TidStoreBeginIterateShared()?


TidStoreBeginIterateShared() is designed for multiple parallel workers
to iterate a shared TidStore. During an iteration, parallel workers
share the iteration state and iterate the underlying radix tree while
taking appropriate locks. Therefore, it's available only for a shared
TidStore. This is required to implement the parallel heap vacuum,
where multiple parallel workers do the iteration on the shared
TidStore.

On the other hand, TidStoreBeginIterate() is designed for a single
process to iterate a TidStore. It accepts even a shared TidStore as
you mentioned, but during an iteration there is no inter-process
coordination such as locking. When it comes to parallel vacuum,
supporting TidStoreBeginIterate() on a shared TidStore is necessary to
cover the case where we use only parallel index vacuum but not
parallel heap scan/vacuum. In this case, we need to store dead tuple
TIDs on the shared TidStore during heap scan so parallel workers can
use it during index vacuum. But it's not necessary to use
TidStoreBeginIterateShared() because only one (leader) process does
heap vacuum.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Parallel heap vacuum

Reply via email to