On Mon, Dec 18, 2023 at 3:41 PM John Naylor <johncnaylo...@gmail.com> wrote: > > On Fri, Dec 15, 2023 at 3:15 PM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > > > On Fri, Dec 15, 2023 at 10:30 AM John Naylor <johncnaylo...@gmail.com> > > wrote: > > > > parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes, > > > - int nrequested_workers, int max_items, > > > - int elevel, BufferAccessStrategy bstrategy) > > > + int nrequested_workers, int vac_work_mem, > > > + int max_offset, int elevel, > > > + BufferAccessStrategy bstrategy) > > > > > > It seems very strange to me that this function has to pass the > > > max_offset. In general, it's been simpler to assume we have a constant > > > max_offset, but in this case that fact is not helping. Something to > > > think about for later. > > > > max_offset was previously used in old TID encoding in tidstore. Since > > tidstore has entries for each block, I think we no longer need it. > > It's needed now to properly size the allocation of TidStoreIter which > contains... > > +/* Result struct for TidStoreIterateNext */ > +typedef struct TidStoreIterResult > +{ > + BlockNumber blkno; > + int num_offsets; > + OffsetNumber offsets[FLEXIBLE_ARRAY_MEMBER]; > +} TidStoreIterResult; > > Maybe we can palloc the offset array to "almost always" big enough, > with logic to resize if needed? If not too hard, seems worth it to > avoid churn in the parameter list.
Yes, I was thinking of that. > > > > v45-0010: > > > > > > Thinking about this some more, I'm not sure we need to do anything > > > different for the *starting* segment size. (Controlling *max* size > > > does seem important, however.) For the corner case of m_w_m = 1MB, > > > it's fine if vacuum quits pruning immediately after (in effect) it > > > finds the DSA has gone to 2MB. It's not worth bothering with, IMO. If > > > the memory accounting starts >1MB because we're adding the trivial > > > size of some struct, let's just stop doing that. The segment > > > allocations are what we care about. > > > > IIUC it's for work_mem, whose the minimum value is 64kB. > > > > > > > > v45-0011: > > > > > > + /* > > > + * max_bytes is forced to be at least 64kB, the current minimum valid > > > + * value for the work_mem GUC. > > > + */ > > > + max_bytes = Max(64 * 1024L, max_bytes); > > > > > > Why? > > > > This is to avoid creating a radix tree within very small memory. The > > minimum work_mem value is a reasonable lower bound that PostgreSQL > > uses internally. It's actually copied from tuplesort.c. > > There is no explanation for why it should be done like tuplesort.c. Also... > > - tree->leaf_ctx = SlabContextCreate(ctx, > - "radix tree leaves", > - RT_SLAB_BLOCK_SIZE(sizeof(RT_VALUE_TYPE)), > - sizeof(RT_VALUE_TYPE)); > + tree->leaf_ctx = SlabContextCreate(ctx, > + "radix tree leaves", > + Min(RT_SLAB_BLOCK_SIZE(sizeof(RT_VALUE_TYPE)), > + work_mem), > + sizeof(RT_VALUE_TYPE)); > > At first, my eyes skipped over this apparent re-indent, but hidden > inside here is another (undocumented) attempt to clamp the size of > something. There are too many of these sprinkled in various places, > and they're already a maintenance hazard -- a different one was left > behind in v45-0011: > > @@ -201,6 +183,7 @@ TidStoreCreate(size_t max_bytes, int max_off, > dsa_area *area) > ts->control->max_bytes = max_bytes - (70 * 1024); > } > > Let's do it in just one place. In TidStoreCreate(), do > > /* clamp max_bytes to at least the size of the empty tree with > allocated blocks, so it doesn't immediately appear full */ > ts->control->max_bytes = Max(max_bytes, {rt, shared_rt}_memory_usage); > > Then we can get rid of all the worry about 1MB/2MB, 64kB, 70kB -- all that. But doesn't it mean that even if we create a shared tidstore with small memory, say 64kB, it actually uses 1MB? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com