Sorry I'd like to make a trivial but critical fix. At Mon, 19 Mar 2018 14:45:05 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyot...@lab.ntt.co.jp> wrote in <20180319.144505.166111203.horiguchi.kyot...@lab.ntt.co.jp> > At Mon, 19 Mar 2018 11:12:58 +0900, Masahiko Sawada <sawada.m...@gmail.com> > wrote in <CAD21AoAB8tQg9xwojupUJjKD=fmhtx6thdependdhftvlwc...@mail.gmail.com> > > On Wed, Mar 14, 2018 at 9:25 PM, Alexander Korotkov > > <a.korot...@postgrespro.ru> wrote: > > > On Wed, Mar 14, 2018 at 7:40 AM, Masahiko Sawada <sawada.m...@gmail.com> > > > wrote: > > >> > > >> On Sat, Mar 10, 2018 at 3:40 AM, Alexander Korotkov > > >> <a.korot...@postgrespro.ru> wrote: > > >> > On Fri, Mar 9, 2018 at 3:12 PM, Masahiko Sawada <sawada.m...@gmail.com> > > >> > wrote: > > >> >> > > >> >> On Fri, Mar 9, 2018 at 8:43 AM, Alexander Korotkov > > >> >> <a.korot...@postgrespro.ru> wrote: > > >> >> > 2) These parameters are reset during btbulkdelete() and set during > > >> >> > btvacuumcleanup(). > > >> >> > > >> >> Can't we set these parameters even during btbulkdelete()? By keeping > > >> >> them up to date, we will able to avoid an unnecessary cleanup vacuums > > >> >> even after index bulk-delete. > > >> > > > >> > > > >> > We certainly can update cleanup-related parameters during > > >> > btbulkdelete(). > > >> > However, in this case we would update B-tree meta-page during each > > >> > VACUUM cycle. That may cause some overhead for non append-only > > >> > workloads. I don't think this overhead would be sensible, because in > > >> > non append-only scenarios VACUUM typically writes much more of > > >> > information. > > >> > But I would like this oriented to append-only workload patch to be > > >> > as harmless as possible for other workloads. > > >> > > >> What overhead are you referring here? I guess the overhead is only the > > >> calculating the oldest btpo.xact. And I think it would be harmless. > > > > > > > > > I meant overhead of setting last_cleanup_num_heap_tuples after every > > > btbulkdelete with wal-logging of meta-page. I bet it also would be > > > harmless, but I think that needs some testing. > > > > Agreed. > > > > After more thought, it might be too late but we can consider the > > possibility of another idea proposed by Peter. Attached patch > > addresses the original issue of index cleanups by storing the epoch > > number of page deletion XID into PageHeader->pd_prune_xid which is > > 4byte field. > > Mmm. It seems to me that the story is returning to the > beginning. Could I try retelling the story? > > I understant that the initial problem was vacuum runs apparently > unnecessary full-scan on indexes many times. The reason for that > is the fact that a cleanup scan may leave some (or many under > certain condition) dead pages not-recycled but we don't know > whether a cleanup is needed or not. They will be staying left > forever unless we run additional cleanup-scans at the appropriate > timing. > > (If I understand it correctly,) Sawada-san's latest proposal is > (fundamentally the same to the first one,) just skipping the > cleanup scan if the vacuum scan just before found that the number > of *live* tuples are increased. If there where many deletions and > insertions but no increase of total number of tuples, we don't > have a cleanup. Consequently it had a wraparound problem and it > is addressed in this version. > > (ditto.) Alexander proposed to record the oldest xid of > recyclable pages in metapage (and the number of tuples at the > last cleanup). This prevents needless cleanup scan and surely > runs cleanups to remove all recyclable pages. > > I think that we can accept Sawada-san's proposal if we accept the > fact that indexes can retain recyclable pages for a long > time. (Honestly I don't think so.) > > If (as I might have mentioned as the same upthread for Yura's > patch,) we accept to hold the information on index meta page, > Alexander's way would be preferable. The difference betwen Yura's > and Alexander's is the former runs cleanup scan if a recyclable > page is present but the latter avoids that before any recyclable
- pages are knwon to be removed. + pages are knwon to be actually removable > > Comparing to the current proposed patch this patch > > doesn't need neither the page upgrade code nor extra WAL-logging. If > > # By the way, my proposal was storing the information as Yura > # proposed into stats collector. The information maybe be > # available a bit lately, but it doesn't harm. This doesn't need > # extra WAL logging nor the upgrad code:p > > > we also want to address cases other than append-only case we will > > I'm afraid that "the problem for the other cases" is a new one > that this patch introduces, not an existing one. > > > require the bulk-delete method of scanning whole index and of logging > > WAL. But it leads some extra overhead. With this patch we no longer > > need to depend on the full scan on b-tree index. This might be useful > > for a future when we make the bulk-delete of b-tree index not scan > > whole index. > > Perhaps I'm taking something incorrectly, but is it just the > result of skipping 'maybe needed' scans without condiering the > actual necessity? > > I also don't like extra WAL logging, but it happens once (or > twice?) per vaccum cycle (for every index). On the other hand I > want to put the on-the-fly upgrade path out of the ordinary > path. (Reviving the pg_upgrade's custom module?) regards, -- Kyotaro Horiguchi NTT Open Source Software Center