Re: [PoC] Improve dead tuple storage for lazy vacuum

Masahiko Sawada Mon, 18 Dec 2023 21:38:07 -0800

On Mon, Dec 18, 2023 at 3:41 PM John Naylor <johncnaylo...@gmail.com> wrote:
>
> On Fri, Dec 15, 2023 at 3:15 PM Masahiko Sawada <sawada.m...@gmail.com> wrote:
> >
> > On Fri, Dec 15, 2023 at 10:30 AM John Naylor <johncnaylo...@gmail.com> 
> > wrote:
>
> > >  parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
> > > - int nrequested_workers, int max_items,
> > > - int elevel, BufferAccessStrategy bstrategy)
> > > + int nrequested_workers, int vac_work_mem,
> > > + int max_offset, int elevel,
> > > + BufferAccessStrategy bstrategy)
> > >
> > > It seems very strange to me that this function has to pass the
> > > max_offset. In general, it's been simpler to assume we have a constant
> > > max_offset, but in this case that fact is not helping. Something to
> > > think about for later.
> >
> > max_offset was previously used in old TID encoding in tidstore. Since
> > tidstore has entries for each block, I think we no longer need it.
>
> It's needed now to properly size the allocation of TidStoreIter which
> contains...
>
> +/* Result struct for TidStoreIterateNext */
> +typedef struct TidStoreIterResult
> +{
> + BlockNumber blkno;
> + int num_offsets;
> + OffsetNumber offsets[FLEXIBLE_ARRAY_MEMBER];
> +} TidStoreIterResult;
>
> Maybe we can palloc the offset array to "almost always" big enough,
> with logic to resize if needed? If not too hard, seems worth it to
> avoid churn in the parameter list.


Yes, I was thinking of that.

>
> > > v45-0010:
> > >
> > > Thinking about this some more, I'm not sure we need to do anything
> > > different for the *starting* segment size. (Controlling *max* size
> > > does seem important, however.) For the corner case of m_w_m = 1MB,
> > > it's fine if vacuum quits pruning immediately after (in effect) it
> > > finds the DSA has gone to 2MB. It's not worth bothering with, IMO. If
> > > the memory accounting starts >1MB because we're adding the trivial
> > > size of some struct, let's just stop doing that. The segment
> > > allocations are what we care about.
> >
> > IIUC it's for work_mem, whose the minimum value is 64kB.
> >
> > >
> > > v45-0011:
> > >
> > > + /*
> > > + * max_bytes is forced to be at least 64kB, the current minimum valid
> > > + * value for the work_mem GUC.
> > > + */
> > > + max_bytes = Max(64 * 1024L, max_bytes);
> > >
> > > Why?
> >
> > This is to avoid creating a radix tree within very small memory. The
> > minimum work_mem value is a reasonable lower bound that PostgreSQL
> > uses internally. It's actually copied from tuplesort.c.
>
> There is no explanation for why it should be done like tuplesort.c. Also...
>
> - tree->leaf_ctx = SlabContextCreate(ctx,
> -    "radix tree leaves",
> -    RT_SLAB_BLOCK_SIZE(sizeof(RT_VALUE_TYPE)),
> -    sizeof(RT_VALUE_TYPE));
> + tree->leaf_ctx = SlabContextCreate(ctx,
> +    "radix tree leaves",
> +    Min(RT_SLAB_BLOCK_SIZE(sizeof(RT_VALUE_TYPE)),
> +    work_mem),
> +    sizeof(RT_VALUE_TYPE));
>
> At first, my eyes skipped over this apparent re-indent, but hidden
> inside here is another (undocumented) attempt to clamp the size of
> something. There are too many of these sprinkled in various places,
> and they're already a maintenance hazard -- a different one was left
> behind in v45-0011:
>
> @@ -201,6 +183,7 @@ TidStoreCreate(size_t max_bytes, int max_off,
> dsa_area *area)
>     ts->control->max_bytes = max_bytes - (70 * 1024);
>   }
>
> Let's do it in just one place. In TidStoreCreate(), do
>
> /* clamp max_bytes to at least the size of the empty tree with
> allocated blocks, so it doesn't immediately appear full */
> ts->control->max_bytes = Max(max_bytes, {rt, shared_rt}_memory_usage);
>
> Then we can get rid of all the worry about 1MB/2MB, 64kB, 70kB -- all that.

But doesn't it mean that even if we create a shared tidstore with
small memory, say 64kB, it actually uses 1MB?

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: [PoC] Improve dead tuple storage for lazy vacuum

Reply via email to