On Fri, Jan 12, 2018 at 1:05 PM, Masahiko Sawada <sawada.m...@gmail.com> wrote: > On Wed, Jan 10, 2018 at 11:28 AM, Masahiko Sawada <sawada.m...@gmail.com> > wrote: >> On Sun, Jan 7, 2018 at 8:40 AM, Peter Geoghegan <p...@bowt.ie> wrote: >>> On Sat, Jan 6, 2018 at 2:20 PM, Stephen Frost <sfr...@snowman.net> wrote: >>>>> > IIRC the patches that makes the cleanup scan skip has a problem >>>>> > pointed by Peter[1], that is that we stash an XID when a btree page is >>>>> > deleted, which is used to determine when it's finally safe to recycle >>>>> > the page. Yura's patch doesn't have that problem? >>>>> > >>>>> > [1] >>>>> > https://www.postgresql.org/message-id/CAH2-Wz%3D1%3Dt5fcGGfarQGcAWBqaCh%2BdLMjpYCYHpEyzK8Qg6OrQ%40mail.gmail.com >>> >>>> Masahiko Sawada, if this patch isn't viable or requires serious rework >>>> to be acceptable, then perhaps we should change its status to 'returned >>>> with feedback' and you can post a new patch for the next commitfest..? >>> >>> I believe that the problem that I pointed out with freezing/wraparound >>> is a solvable problem. If we think about it carefully, we will come up >>> with a good solution. I have tried to get the ball rolling with my >>> pd_prune_xid suggestion. I think it's important to not lose sight of >>> the fact that the page deletion/recycling XID thing is just one detail >>> that we need to think about some more. >>> >>> I cannot fault Sawada-san for waiting to hear other people's views >>> before proceeding. It really needs to be properly discussed. >>> >> >> Thank you for commenting. >> >> IIUC we have two approaches: one idea is based on Peter's suggestion. >> We can use pd_prune_xid to store epoch of xid of which the page is >> deleted. That way, we can correctly mark deleted pages as recyclable >> without breaking on-disk format. >> >> Another idea is suggested by Sokolov Yura. His original patch makes >> btree have a flag in btpo_flags that implies the btree has deleted but >> not recyclable page or not. I'd rather want to store it as bool in >> BTMetaPageData. Currently btm_version is defined as uint32 but I think >> we won't use all of them. If we store it in part of btm_version we >> don't break on-disk format. However, we're now assuming that the >> vacuum on btree index always scans whole btree rather than a part, and >> this approach will nurture it more. It might be possible that it will >> become a restriction in the future. >> > > On third thought, it's similar to what we are doing for heaps but we > can store the oldest btpo.xact of a btree index to somewhere(metapage > or pg_class.relfrozenxid?) after vacuumed. We can skip cleanup > vacuuming as long as the xid is not more than a certain threshold old > (say vacuum_index_cleanup_age). Combining with new parameter I > proposed before, the condition of doing cleanup index vacuum will > become as follows. > > if (there is garbage on heap) > Vacuum index, and update oldest btpo.xact > else /* there is no garbage on heap */ > { > if (# of scanned pages > (nblocks * > vacuum_cleanup_index_scale_factor) OR oldest btpo.xact is more than > vacuum_index_cleanup_age old?)) > Cleanup vacuum index, and update oldest btpo.xact > } > > In current index vacuuming, it scans whole index pages if there is > even one garbage on heap(I'd like to improve it though someday), which > it also lead to update the oldest btpo.xact at the time. If > vacuum_cleanup_index_slace_factor is enough high, we can skip the > scanning whole index as long as either the table isn't modified for 2 > billion transactions or the oldest btpo.xact isn't more than > vacuum_index_cleanup_age transactions old. But if there is a opened > transaction for a very long time, we might not be able to mark deleted > page as recyclable. Some tricks might be required. Feedback is > welcome. >
Since this patch still needs to be discussed before proceeding, I've marked it as "Needs Review". Feedback is very welcome. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center