On Tue, 18 Aug 2020 at 13:06, Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Mon, Aug 17, 2020 at 11:38 AM Masahiko Sawada > <masahiko.saw...@2ndquadrant.com> wrote: > > > > On Sat, 15 Aug 2020 at 12:19, Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > > > The reason why I have not included heap_page_prune related change in > > > the patch is that I don't want to sprinkle this in every possible > > > function (code path) called via vacuum especially if the probability > > > of an error in that code path is low. But, I am fine if you and or > > > others think that it is a good idea to update offset in > > > heap_page_prune as well. > > > > I agree to not try sprinkling it many places than necessity. > > > > Regarding heap_page_prune(), I'm concerned a bit that > > heap_page_prune() is typically the first function to check the tuple > > visibility within the vacuum code. I've sometimes observed an error > > with the message like "DETAIL: could not open file “pg_xact/00AB”: No > > such file or directory" due to a tuple header corruption. I suspect > > this message was emitted while checking tuple visibility in > > heap_page_prune(). So I guess the likelihood of an error in that code > > is not so low. > > > > Fair point. > > > On the other hand, if we want to update the offset number in > > heap_page_prune() we will need to expose some vacuum structs defined > > as a static including LVRelStats. > > > > I don't think we need to expose LVRelStats. We can just pass the > address of vacrelstats->offset_num to achieve what we want. I have > tried that and it works, see the > v6-0002-additinal-error-context-information-in-heap_page_ patch > attached. Do you see any problem with it?
Yes, you're right. I'm concerned a bit the number of arguments passing to heap_page_prune() might get higher when we need other values to update for errcontext, but I'm okay with the current patch. Currently, we're in SCAN_HEAP phase in heap_page_prune() but should it be VACUUM_HEAP instead? Also, I've tested the patch with log_min_messages = 'info' and get the following sever logs: 2020-08-19 14:28:09.917 JST [72912] INFO: launched 1 parallel vacuum worker for index vacuuming (planned: 1) 2020-08-19 14:28:09.917 JST [72912] CONTEXT: while scanning block 973 of relation "public.tbl" 2020-08-19 14:28:09.959 JST [72912] INFO: scanned index "i1" to remove 109872 row versions 2020-08-19 14:28:09.959 JST [72912] DETAIL: CPU: user: 0.04 s, system: 0.00 s, elapsed: 0.04 s 2020-08-19 14:28:09.959 JST [72912] CONTEXT: while vacuuming index "i1" of relation "public.tbl" 2020-08-19 14:28:09.967 JST [72936] INFO: scanned index "i2" to remove 109872 row versions by parallel vacuum worker 2020-08-19 14:28:09.967 JST [72936] DETAIL: CPU: user: 0.03 s, system: 0.00 s, elapsed: 0.04 s 2020-08-19 14:28:09.967 JST [72936] CONTEXT: while vacuuming index "i2" of relation "public.tbl" 2020-08-19 14:28:09.967 JST [72912] INFO: scanned index "i2" to remove 109872 row versions by parallel vacuum worker 2020-08-19 14:28:09.967 JST [72912] DETAIL: CPU: user: 0.03 s, system: 0.00 s, elapsed: 0.04 s 2020-08-19 14:28:09.967 JST [72912] CONTEXT: while vacuuming index "i2" of relation "public.tbl" parallel worker while scanning block 973 of relation "public.tbl" 2020-08-19 14:28:09.968 JST [72912] INFO: "tbl": removed 109872 row versions in 487 pages 2020-08-19 14:28:09.968 JST [72912] DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s 2020-08-19 14:28:09.968 JST [72912] CONTEXT: while vacuuming block 973 of relation "public.tbl" 2020-08-19 14:28:09.968 JST [72912] INFO: index "i1" now contains 110000 row versions in 578 pages 2020-08-19 14:28:09.968 JST [72912] DETAIL: 109872 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. 2020-08-19 14:28:09.968 JST [72912] CONTEXT: while scanning block 973 of relation "public.tbl" 2020-08-19 14:28:09.968 JST [72912] INFO: index "i2" now contains 110000 row versions in 578 pages 2020-08-19 14:28:09.968 JST [72912] DETAIL: 109872 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. 2020-08-19 14:28:09.968 JST [72912] CONTEXT: while scanning block 973 of relation "public.tbl" 2020-08-19 14:28:09.969 JST [72912] INFO: "tbl": found 110000 removable, 110000 nonremovable row versions in 974 out of 974 pages 2020-08-19 14:28:09.969 JST [72912] DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 519 There were 372 unused item identifiers. Skipped 0 pages due to buffer pins, 0 frozen pages. 0 pages are entirely empty. CPU: user: 0.05 s, system: 0.00 s, elapsed: 0.06 s. 2020-08-19 14:28:09.969 JST [72912] CONTEXT: while scanning block 973 of relation "public.tbl" This is not directly related to the patch but it looks like we can improve the current errcontext settings. For instance, the message from lazy_vacuum_index(): there are two messages reporting the phases. I've attached the patch that improves the current errcontext setting, which can be applied before the patch adding offset number. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 03c8e1ff7e..d4b05810d3 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -1669,6 +1669,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats, /* report that everything is scanned and vacuumed */ pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno); + /* Clear the block number information */ + vacrelstats->blkno = InvalidBlockNumber; + pfree(frozen); /* save stats for use later */ @@ -1886,6 +1889,9 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats) npages++; } + /* Clear the block number information */ + vacrelstats->blkno = InvalidBlockNumber; + if (BufferIsValid(vmbuffer)) { ReleaseBuffer(vmbuffer); @@ -2503,30 +2509,30 @@ lazy_cleanup_index(Relation indrel, *stats = index_vacuum_cleanup(&ivinfo, *stats); + if (*stats) + { + if (IsParallelWorker()) + msg = gettext_noop("index \"%s\" now contains %.0f row versions in %u pages as reported by parallel vacuum worker"); + else + msg = gettext_noop("index \"%s\" now contains %.0f row versions in %u pages"); + + ereport(elevel, + (errmsg(msg, + RelationGetRelationName(indrel), + (*stats)->num_index_tuples, + (*stats)->num_pages), + errdetail("%.0f index row versions were removed.\n" + "%u index pages have been deleted, %u are currently reusable.\n" + "%s.", + (*stats)->tuples_removed, + (*stats)->pages_deleted, (*stats)->pages_free, + pg_rusage_show(&ru0)))); + } + /* Revert back to the old phase information for error traceback */ restore_vacuum_error_info(vacrelstats, &saved_err_info); pfree(vacrelstats->indname); vacrelstats->indname = NULL; - - if (!(*stats)) - return; - - if (IsParallelWorker()) - msg = gettext_noop("index \"%s\" now contains %.0f row versions in %u pages as reported by parallel vacuum worker"); - else - msg = gettext_noop("index \"%s\" now contains %.0f row versions in %u pages"); - - ereport(elevel, - (errmsg(msg, - RelationGetRelationName(indrel), - (*stats)->num_index_tuples, - (*stats)->num_pages), - errdetail("%.0f index row versions were removed.\n" - "%u index pages have been deleted, %u are currently reusable.\n" - "%s.", - (*stats)->tuples_removed, - (*stats)->pages_deleted, (*stats)->pages_free, - pg_rusage_show(&ru0)))); } /*