On Mon, 17 Feb 2020 at 12:57, Justin Pryzby <pry...@telsasoft.com> wrote: > > On Mon, Feb 17, 2020 at 10:47:47AM +0900, Masahiko Sawada wrote: > > Thank you for updating the patch! > >
Thank you for updating the patch. > > 1. > > The above lines need a new line. > > Done, thanks. > > > 2. > > In lazy_vacuum_heap, we set the error context and then call > > pg_rusage_init whereas lazy_vacuum_index and lazy_cleanup_index does > > the opposite. And lazy_scan_heap also call pg_rusage first. I think > > lazy_vacuum_heap should follow them for consistency. That is, we can > > set error context after pages = 0. > > Right. Maybe I did it the other way because the two uses of > PROGRESS_VACUUM_PHASE_VACUUM_HEAP were right next to each other. > > > 3. > > We have 2 other phases: PROGRESS_VACUUM_PHASE_TRUNCATE and > > PROGRESS_VACUUM_PHASE_FINAL_CLEANUP. I think it's better to set the > > error context in lazy_truncate_heap as well. What do you think? > > > > I'm not sure it's worth to set the error context for FINAL_CLENAUP but > > we should add the case of FINAL_CLENAUP to vacuum_error_callback as > > no-op or explain it as a comment even if we don't. > > I don't have strong feelings either way. > > I looked a bit at the truncation phase. It also truncates the FSM and VM > forks, which could be misleading if the error was in one of those files and > not > the main filenode. > > I'd have to find a way to test it... > ...and was pleasantly surprised to see that earlier phases don't choke if I > (re)create a fake 4GB table like: > > postgres=# CREATE TABLE trunc(i int); > CREATE TABLE > postgres=# \x\t > Expanded display is on. > Tuples only is on. > postgres=# SELECT relfilenode FROM pg_class WHERE oid='trunc'::regclass; > relfilenode | 59068 > > postgres=# \! bash -xc 'truncate -s 1G > ./pgsql13.dat111/base/12689/59068{,.{1..3}}' > + truncate -s 1G ./pgsql13.dat111/base/12689/59074 > ./pgsql13.dat111/base/12689/59074.1 ./pgsql13.dat111/base/12689/59074.2 > ./pgsql13.dat111/base/12689/59074.3 > > postgres=# \timing > Timing is on. > postgres=# SET client_min_messages=debug; SET statement_timeout='13s'; VACUUM > VERBOSE trunc; > INFO: vacuuming "public.trunc" > INFO: "trunc": found 0 removable, 0 nonremovable row versions in 524288 out > of 524288 pages > DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 2098 > There were 0 unused item identifiers. > Skipped 0 pages due to buffer pins, 0 frozen pages. > 524288 pages are entirely empty. > CPU: user: 5.00 s, system: 1.50 s, elapsed: 6.52 s. > ERROR: canceling statement due to statement timeout > CONTEXT: while truncating relation "public.trunc" to 0 blocks > Yeah lazy_scan_heap deals with all dummy files as new empty pages. > The callback surrounding RelationTruncate() seems hard to hit unless you add > CHECK_FOR_INTERRUPTS(); the same was true for index cleanup. > > The truncation uses a prefetch, which is more likely to hit any lowlevel > error, > so I added callback there, too. > > BTW, for the index cases, I didn't like repeating the namespace here, but > WDYT ? > |CONTEXT: while vacuuming index "public.t_i_idx3" of relation "t" The current message looks good to me because we cannot have a table and its index in the different schema. 1. pg_rusage_init(&ru0); npages = 0; /* * Setup error traceback support for ereport() */ init_vacuum_error_callback(&errcallback, &errcbarg, onerel, PROGRESS_VACUUM_PHASE_VACUUM_HEAP); error_context_stack = &errcallback; tupindex = 0; Oops it seems to me that it's better to set error context after tupindex = 0. Sorry for my bad. And the above comment can be written in a single line as other same comments. 2. @@ -2568,6 +2643,12 @@ count_nondeletable_pages(Relation onerel, LVRelStats *vacrelstats) BlockNumber blkno; BlockNumber prefetchedUntil; instr_time starttime; + ErrorContextCallback errcallback; + vacuum_error_callback_arg errcbarg; + + /* Setup error traceback support for ereport() */ + init_vacuum_error_callback(&errcallback, &errcbarg, onerel, + PROGRESS_VACUUM_PHASE_TRUNCATE); Hmm I don't think it's a good idea to have count_nondeletable_pages set the error context of PHASE_TRUNCATE. Because the patch sets the error context during RelationTruncate that actually truncates the heap but count_nondeletable_pages doesn't truncate it. I think setting the error context only during RelationTruncate would be a good start. We can hear other opinions from other hackers. Some hackers may want to set the error context for whole lazy_truncate_heap. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services