On Mon, Jan 25, 2021 at 5:19 PM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > On Thu, Jan 21, 2021 at 11:24 PM Masahiko Sawada <sawada.m...@gmail.com> > wrote: > > > > On Wed, Jan 20, 2021 at 7:58 AM Peter Geoghegan <p...@bowt.ie> wrote: > > > > > > On Sun, Jan 17, 2021 at 9:18 PM Masahiko Sawada <sawada.m...@gmail.com> > > > wrote: > > > > After more thought, I think that ambulkdelete needs to be able to > > > > refer the answer to amvacuumstrategy. That way, the index can skip > > > > bulk-deletion when lazy vacuum doesn't vacuum heap and it also doesn’t > > > > want to do that. > > > > > > Makes sense. > > > > > > BTW, your patch has bitrot already. Peter E's recent pageinspect > > > commit happens to conflict with this patch. It might make sense to > > > produce a new version that just fixes the bitrot, so that other people > > > don't have to deal with it each time. > > > > > > > I’ve attached the updated version patch that includes the following > > > > changes: > > > > > > Looks good. I'll give this version a review now. I will do a lot more > > > soon. I need to come up with a good benchmark for this, that I can > > > return to again and again during review as needed. > > > > Thank you for reviewing the patches. > > > > > > > > Some feedback on the first patch: > > > > > > * Just so you know: I agree with you about handling > > > VACOPT_TERNARY_DISABLED in the index AM's amvacuumstrategy routine. I > > > think that it's better to do that there, even though this choice may > > > have some downsides. > > > > > > * Can you add some "stub" sgml doc changes for this? Doesn't have to > > > be complete in any way. Just a placeholder for later, that has the > > > correct general "shape" to orientate the reader of the patch. It can > > > just be a FIXME comment, plus basic mechanical stuff -- details of the > > > new amvacuumstrategy_function routine and its signature. > > > > > > > 0002 patch had the doc update (I mistakenly included it to 0002 > > patch). Is that update what you meant? > > > > > Some feedback on the second patch: > > > > > > * Why do you move around IndexVacuumStrategy in the second patch? > > > Looks like a rebasing oversight. > > > > Check. > > > > > > > > * Actually, do we really need the first and second patches to be > > > separate patches? I agree that the nbtree patch should be a separate > > > patch, but dividing the first two sets of changes doesn't seem like it > > > adds much. Did I miss some something? > > > > I separated the patches since I used to have separate patches when > > adding other index AM options required by parallel vacuum. But I > > agreed to merge the first two patches. > > > > > > > > * Is the "MAXALIGN(sizeof(ItemIdData)))" change in the definition of > > > MaxHeapTuplesPerPage appropriate? Here is the relevant section from > > > the patch: > > > > > > diff --git a/src/include/access/htup_details.h > > > b/src/include/access/htup_details.h > > > index 7c62852e7f..038e7cd580 100644 > > > --- a/src/include/access/htup_details.h > > > +++ b/src/include/access/htup_details.h > > > @@ -563,17 +563,18 @@ do { \ > > > /* > > > * MaxHeapTuplesPerPage is an upper bound on the number of tuples that > > > can > > > * fit on one heap page. (Note that indexes could have more, because > > > they > > > - * use a smaller tuple header.) We arrive at the divisor because each > > > tuple > > > - * must be maxaligned, and it must have an associated line pointer. > > > + * use a smaller tuple header.) We arrive at the divisor because each > > > line > > > + * pointer must be maxaligned. > > > *** SNIP *** > > > #define MaxHeapTuplesPerPage \ > > > - ((int) ((BLCKSZ - SizeOfPageHeaderData) / \ > > > - (MAXALIGN(SizeofHeapTupleHeader) + sizeof(ItemIdData)))) > > > + ((int) ((BLCKSZ - SizeOfPageHeaderData) / > > > (MAXALIGN(sizeof(ItemIdData))))) > > > > > > It's true that ItemIdData structs (line pointers) are aligned, but > > > they're not MAXALIGN()'d. If they were then the on-disk size of line > > > pointers would generally be 8 bytes, not 4 bytes. > > > > You're right. Will fix. > > > > > > > > * Maybe it would be better if you just changed the definition such > > > that "MAXALIGN(SizeofHeapTupleHeader)" became "MAXIMUM_ALIGNOF", with > > > no other changes? (Some variant of this suggestion might be better, > > > not sure.) > > > > > > For some reason that feels a bit safer: we still have an "imaginary > > > tuple header", but it's just 1 MAXALIGN() quantum now. This is still > > > much less than the current 3 MAXALIGN() quantums (i.e. what > > > MaxHeapTuplesPerPage treats as the tuple header size). Do you think > > > that this alternative approach will be noticeably less effective > > > within vacuumlazy.c? > > > > > > Note that you probably understand the issue with MaxHeapTuplesPerPage > > > for vacuumlazy.c better than I do currently. I'm still trying to > > > understand your choices, and to understand what is really important > > > here. > > > > Yeah, using MAXIMUM_ALIGNOF seems better for safety. I shared my > > thoughts on the issue with MaxHeapTuplesPerPage yesterday. I think we > > need to discuss how to deal with that. > > > > > > > > * Maybe add a #define for the value 0.7? (I refer to the value used in > > > choose_vacuum_strategy() to calculate a "this is the number of LP_DEAD > > > line pointers that we consider too many" cut off point, which is to be > > > applied throughout lazy_scan_heap() processing.) > > > > > > > Agreed. > > > > > * I notice that your new lazy_vacuum_table_and_indexes() function is > > > the only place that calls lazy_vacuum_table_and_indexes(). I think > > > that you should merge them together -- replace the only remaining call > > > to lazy_vacuum_table_and_indexes() with the body of the function > > > itself. Having a separate lazy_vacuum_table_and_indexes() function > > > doesn't seem useful to me -- it doesn't actually hide complexity, and > > > might even be harder to maintain. > > > > lazy_vacuum_table_and_indexes() is called at two places: after > > maintenance_work_mem run out (around L1097) and the end of > > lazy_scan_heap() (around L1726). I defined this function to pack the > > operations such as choosing a strategy, vacuuming indexes and > > vacuuming heap. Without this function, will we end up writing the same > > codes twice there? > > > > > > > > * I suggest thinking about what the last item will mean for the > > > reporting that currently takes place in > > > lazy_vacuum_table_and_indexes(), but will now go in an expanded > > > lazy_vacuum_table_and_indexes() -- how do we count the total number of > > > index scans now? > > > > > > I don't actually believe that the logic needs to change, but some kind > > > of consolidation and streamlining seems like it might be helpful. > > > Maybe just a comment that says "note that all index scans might just > > > be no-ops because..." -- stuff like that. > > > > What do you mean by the last item and what report? I think > > lazy_vacuum_table_and_indexes() itself doesn't report anything and > > vacrelstats->num_index_scan counts the total number of index scans. > > > > > > > > * Any idea about how hard it will be to teach is_wraparound VACUUMs to > > > skip index cleanup automatically, based on some practical/sensible > > > criteria? > > > > One simple idea would be to have a to-prevent-wraparound autovacuum > > worker disables index cleanup (i.g., setting VACOPT_TERNARY_DISABLED > > to index_cleanup). But a downside (but not a common case) is that > > since a to-prevent-wraparound vacuum is not necessarily an aggressive > > vacuum, it could skip index cleanup even though it cannot move > > relfrozenxid forward. > > > > > > > > It would be nice to have a basic PoC for that, even if it remains a > > > PoC for the foreseeable future (i.e. even if it cannot be committed to > > > Postgres 14). This feature should definitely be something that your > > > patch series *enables*. I'd feel good about having covered that > > > question as part of this basic design work if there was a PoC. That > > > alone should make it 100% clear that it's easy to do (or no harder > > > than it should be -- it should ideally be compatible with your basic > > > design). The exact criteria that we use for deciding whether or not to > > > skip index cleanup (which probably should not just be "this VACUUM is > > > is_wraparound, good enough" in the final version) may need to be > > > debated at length on pgsql-hackers. Even still, it is "just a detail" > > > in the code. Whereas being *able* to do that with your design (now or > > > in the future) seems essential now. > > > > Agreed. I'll write a PoC patch for that. > > > > > > > > > * Store the answers to amvacuumstrategy into either the local memory > > > > or DSM (in parallel vacuum case) so that ambulkdelete can refer the > > > > answer to amvacuumstrategy. > > > > * Fix regression failures. > > > > * Update the documentation and commments. > > > > > > > > Note that 0003 patch is still PoC quality, lacking the btree meta page > > > > version upgrade. > > > > > > This patch is not the hard part, of course -- there really isn't that > > > much needed here compared to vacuumlazy.c. So this patch seems like > > > the simplest 1 out of the 3 (at least to me). > > > > > > Some feedback on the third patch: > > > > > > * The new btm_last_deletion_nblocks metapage field should use P_NONE > > > (which is 0) to indicate never having been vacuumed -- not > > > InvalidBlockNumber (which is 0xFFFFFFFF). > > > > > > This is more idiomatic in nbtree, which is nice, but it has a very > > > significant practical advantage: it ensures that every heapkeyspace > > > nbtree index (i.e. those on recent nbtree versions) can be treated as > > > if it has the new btm_last_deletion_nblocks field all along, even when > > > it actually built on Postgres 12 or 13. This trick will let you avoid > > > dealing with the headache of bumping BTREE_VERSION, which is a huge > > > advantage. > > > > > > Note that this is the same trick I used to avoid bumping BTREE_VERSION > > > when the btm_allequalimage field needed to be added (for the nbtree > > > deduplication feature added to Postgres 13). > > > > > > > That's a nice way with a great advantage. I'll use P_NONE. > > > > > * Forgot to do this in the third patch (think I made this same mistake > > > once myself): > > > > > > diff --git a/contrib/pageinspect/btreefuncs.c > > > b/contrib/pageinspect/btreefuncs.c > > > index 8bb180bbbe..88dfea9af3 100644 > > > --- a/contrib/pageinspect/btreefuncs.c > > > +++ b/contrib/pageinspect/btreefuncs.c > > > @@ -653,7 +653,7 @@ bt_metap(PG_FUNCTION_ARGS) > > > BTMetaPageData *metad; > > > TupleDesc tupleDesc; > > > int j; > > > - char *values[9]; > > > + char *values[10]; > > > Buffer buffer; > > > Page page; > > > HeapTuple tuple; > > > @@ -734,6 +734,11 @@ bt_metap(PG_FUNCTION_ARGS) > > > > Check. > > > > I'm updating and testing the patch. I'll submit the updated version > > patches tomorrow. > > > > Sorry for the late. > > I've attached the updated version patch that incorporated the comments > I got so far. > > I merged the previous 0001 and 0002 patches. 0003 patch is now another > PoC patch that disables index cleanup automatically when autovacuum is > to prevent xid-wraparound and an aggressive vacuum.
Since I found some bugs in the v3 patch I attached the updated version patches. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
v4-0003-PoC-disable-index-cleanup-when-an-anti-wraparound.patch
Description: Binary data
v4-0002-Skip-btree-bulkdelete-if-the-index-doesn-t-grow.patch
Description: Binary data
v4-0001-Choose-vacuum-strategy-before-heap-and-index-vacu.patch
Description: Binary data