Hello, I have some other comments. At Mon, 4 Mar 2019 23:27:10 +0000, "Bossart, Nathan" <bossa...@amazon.com> wrote in <48410154-e6c5-4c07-8122-8d04e3bcd...@amazon.com> > On 2/28/19, 12:13 AM, "Masahiko Sawada" <sawada.m...@gmail.com> wrote: > > Attached the updated version patch. I've incorporated all review > > comments I got and have changed the number of tuples being reported as > > 'removed tuples'. With this option, tuples completely being removed is > > only tuples marked as unused during HOT-pruning, other dead tuples are > > left. So we count those tuples during HOT-pruning and reports it as > > removed tuples. > > Thanks for the new patches. Beyond the reloptions discussion, most of > my feedback is wording suggestions. > > + <command>VACUUM</command> removes dead tuples and prunes HOT-updated > + tuples chain for live tuples on table. If the table has any dead tuple > + it removes them from both table and indexes for re-use. With this > + option <command>VACUUM</command> doesn't completely remove dead tuples > + and disables removing dead tuples from indexes. This is suitable for > + avoiding transaction ID wraparound (see > + <xref linkend="vacuum-for-wraparound"/>) but not sufficient for > avoiding > + index bloat. This option is ignored if the table doesn't have index. > + This cannot be used in conjunction with <literal>FULL</literal> > + option. > > There are a couple of small changes I would make. How does something > like this sound? > > VACUUM removes dead tuples and prunes HOT-updated tuple chains for > live tuples on the table. If the table has any dead tuples, it > removes them from both the table and its indexes and marks the > corresponding line pointers as available for re-use. With this > option, VACUUM still removes dead tuples from the table, but it > does not process any indexes, and the line pointers are marked as > dead instead of available for re-use. This is suitable for > avoiding transaction ID wraparound (see Section 24.1.5) but not > sufficient for avoiding index bloat. This option is ignored if > the table does not have any indexes. This cannot be used in > conjunction with the FULL option. > > - * Returns the number of tuples deleted from the page and sets > - * latestRemovedXid. > + * Returns the number of tuples deleted from the page and set latestRemoveXid > + * and increment nunused. > > I would say something like: "Returns the number of tuples deleted from > the page, sets latestRemovedXid, and updates nunused." > > + /* > + * hasindex = true means two-pass strategy; false means one-pass. But we > + * always use the one-pass strategy when index vacuum is disabled. > + */ > > I think the added sentence should make it more clear that hasindex > will still be true when DISABLE_INDEX_CLEANUP is used. Maybe > something like: > > /* > * hasindex = true means two-pass strategy; false means one-pass > * > * If DISABLE_INDEX_CLEANUP is used, hasindex may still be true, > * but we'll always use the one-pass strategy. > */ > > tups_vacuumed += heap_page_prune(onerel, buf, OldestXmin, > false, > - > &vacrelstats->latestRemovedXid); > + > &vacrelstats->latestRemovedXid, > + > &tups_pruned); > > Why do we need a separate tups_pruned argument in heap_page_prune()? > Could we add the result of heap_page_prune() to tups_pruned instead, > then report the total number of removed tuples as tups_vacuumed + > tups_pruned elsewhere? > > + * If there are no indexes or we skip index vacuum then we can > vacuum > + * the page right now instead of doing a second scan. > > How about: > > If there are no indexes or index cleanup is disabled, we can > vacuum the page right now instead of doing a second scan. > > + /* > + * Here, we have indexes but index vacuum is > disabled. We don't > + * vacuum dead tuples on heap but forget them > as we skip index > + * vacuum. The vacrelstats->dead_tuples could > have tuples which > + * became dead after checked at HOT-pruning > time but aren't marked > + * as dead yet. We don't process them because > it's a very rare > + * condition and the next vacuum will process > them. > + */ > > I would suggest a few small changes: > > /* > * Here, we have indexes but index vacuum is disabled. Instead of > * vacuuming the dead tuples on the heap, we just forget them. > * > * Note that vacrelstats->dead_tuples could include tuples which > * became dead after HOT-pruning but are not marked dead yet. We > * do not process them because this is a very rare condition, and > * the next vacuum will process them anyway. > */ > > - /* If no indexes, make log report that lazy_vacuum_heap would've made > */ > + /* > + * If no index or disables index vacuum, make log report that > lazy_vacuum_heap > + * would've made. If index vacuum is disabled, we didn't remove all > dead > + * tuples but did for tuples removed by HOT-pruning. > + */ > if (vacuumed_pages) > ereport(elevel, > (errmsg("\"%s\": removed %.0f row versions in > %u pages", > > RelationGetRelationName(onerel), > - tups_vacuumed, > vacuumed_pages))); > + skip_index_vacuum ? > tups_pruned : tups_vacuumed, > + vacuumed_pages))); > > How about: > > /* > * If no index or index vacuum is disabled, make log report that > * lazy_vacuum_heap would've made. If index vacuum is disabled, > * we don't include the tuples that we marked dead, but we do > * include tuples removed by HOT-pruning. > */ > > Another interesting thing I noticed is that this "removed X row > versions" message is only emitted if vacuumed_pages is greater than 0. > However, if we only did HOT pruning, tups_vacuumed will be greater > than 0 while vacuumed_pages will still be 0, so some information will > be left out. I think this is already the case, though, so this could > probably be handled in a separate thread.
+ nleft; /* item pointers we left */ The name seems to be something other, and the comment doesn't makes sense at least.. for me.. Looking below, + "%.0f tuples are left as dead.\n", + nleft), + nleft); How about "nleft_dead; /* iterm pointers left as dead */"? In this block: - if (nindexes == 0 && + if ((nindexes == 0 || skip_index_vacuum) && vacrelstats->num_dead_tuples > 0) { Is it right that vacuumed_pages is incremented and FSM is updated while the page is not actually vacuumed? tups_vacuumed += heap_page_prune(onerel, buf, OldestXmin, false, - &vacrelstats->latestRemovedXid); + &vacrelstats->latestRemovedXid, + &tups_pruned); tups_pruned looks as "HOT pruned tuples". It is named "unused" in the function's parameters. (But I think it is useless. Please see the details below.) I tested it with a simple data set. (autovacuum = off) drop table if exists t; create table t with (fillfactor=50) as select a, a % 3 as b from generate_series(1, 9) a; create index on t(a); update t set a = -a where b = 0; update t set b = b + 1 where b = 1; We now have 9 tuples, 15 versions and 3 out of 6 "old" tuples are to be "left dead" by DISABLE_INDEX_CLEANUP vacuum. It means, three tuples ends with "left dead", three tuples are removed and 12 tuples will survive the vacuum below. vacuum (verbose, freeze ,disable_index_cleanup, disable_page_skipping) t; > INFO: "t": removed 0 row versions in 1 pages > INFO: "t": found 0 removable, 9 nonremovable row versions in 1 out of 1 pages > DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 925 > There were 0 unused item pointers. > Skipped 0 pages due to buffer pins, 0 frozen pages. > 0 pages are entirely empty. > 3 tuples are left as dead. Three tuple versions have vanished. Actually they were removed but not shown in the message. heap_prune_chain() doesn't count a live root entry of a chain as "unused (line pointer)" since it is marked as "redierected". As the result the vanished tuples are counted in tups_vacuumed, not in tups_pruned. Maybe the name tups_vacuumed is confusing. After removing tups_pruned code it works correctly. > INFO: "t": removed 6 row versions in 1 pages > INFO: "t": found 6 removable, 9 nonremovable row versions in 1 out of 1 pages > DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 932 > There were 0 unused item pointers. > Skipped 0 pages due to buffer pins, 0 frozen pages. > 0 pages are entirely empty. > 3 tuples are left as dead. I see two choices of the second line above. 1> "t": found 6 removable, 9 nonremovable row versions in 1 out of 1 pages "removable" includes "left dead" tuples. 2> "t": found 3 removable, 12 nonremovable row versions in 1 out of 1 pages "removable" excludes "left dead" tuples. If you prefer the latter, removable and nonremoveable need to be corrected using nleft. > INFO: "t": found 3 removable, 12 nonremovable row versions in 1 out of 1 > pages > DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 942 > There were 0 unused item pointers. > Skipped 0 pages due to buffer pins, 0 frozen pages. > 0 pages are entirely empty. > 3 tuples are left as dead. > CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. regards. -- Kyotaro Horiguchi NTT Open Source Software Center