On Fri, Sep 22, 2017 at 4:46 AM, Kyotaro HORIGUCHI <horiguchi.kyot...@lab.ntt.co.jp> wrote: > I apologize in advance of possible silliness. > > At Thu, 21 Sep 2017 13:54:01 -0300, Claudio Freire <klaussfre...@gmail.com> > wrote in <CAGTBQpYvgdqxVaiyui=bkrzw7zzftqi9kecul4-lkc2thqx...@mail.gmail.com> >> On Tue, Sep 19, 2017 at 8:55 PM, Peter Geoghegan <p...@bowt.ie> wrote: >> > On Tue, Sep 19, 2017 at 4:47 PM, Claudio Freire <klaussfre...@gmail.com> >> > wrote: >> >> Maybe this is looking at the problem from the wrong direction. >> >> >> >> Why can't the page be added to the FSM immediately and the check be >> >> done at runtime when looking for a reusable page? >> >> >> >> Index FSMs currently store only 0 or 255, couldn't they store 128 for >> >> half-recyclable pages and make the caller re-check reusability before >> >> using it? >> > >> > No, because it's impossible for them to know whether or not the page >> > that their index scan just landed on recycled just a second ago, or >> > was like this since before their xact began/snapshot was acquired. >> > >> > For your reference, this RecentGlobalXmin interlock stuff is what >> > Lanin & Shasha call "The Drain Technique" within "2.5 Freeing Empty >> > Nodes". Seems pretty hard to do it any other way. >> >> I don't see the difference between a vacuum run and distributed >> maintainance at _bt_getbuf time. In fact, the code seems to be in >> place already. > > The pages prohibited to register as "free" by RecentGlobalXmin > cannot be grabbed _bt_getbuf since the page is liked from nowhere > nor FSM doesn't offer the pages is "free".
Yes, but suppose vacuum did add them to the FSM in the first round, but with a special marker that differentiates them from immediately recycleable ones. >> _bt_page_recyclable seems to prevent old transactions from treating >> those pages as recyclable already, and the description of the >> technique in 2.5 doesn't seem to preclude doing the drain while doing >> other operations. In fact, Lehman even considers the possibility of >> multiple concurrent garbage collectors. > > _bt_page_recyclable prevent a vacuum scan from discarding pages > that might be looked from any active transaction, and the "drain" > itself is a technique to prevent freeing still-active pages so a > scan using the "drain" technique is freely executed > simultaneously with other transactions. The paper might allow > concurrent GCs (or vacuums) but our nbtree is saying that no > concurrent vacuum is assumed. Er... here it is. > > nbtpages.c:1589: _bt_unlink_halfdead_page > | * right. This search could fail if either the sibling or the target page > | * was deleted by someone else meanwhile; if so, give up. (Right now, > | * that should never happen, since page deletion is only done in VACUUM > | * and there shouldn't be multiple VACUUMs concurrently on the same > | * table.) Ok, yes, but we're not talking about halfdead pages, but deleted pages that haven't been recycled yet. >> It's only a matter of making the page visible in the FSM in a way that >> can be efficiently skipped if we want to go directly to a page that >> actually CAN be recycled to avoid looping forever looking for a >> recyclable page in _bt_getbuf. In fact, that's pretty much Lehman's > > Mmm. What _bt_getbuf does is recheck the page given from FSM as a > "free page". If FSM gives no more page, it just tries to extend > the index relation. Or am I reading you wrongly? On non-index FSMs, you can request a page that has at least N free bytes. Index FSMs always mark pages as fully empty or fully full, no in-betweens, but suppose we used that capability of the data structure to mark "maybe recycleable" pages with 50% free space, and "surely recycleable" pages with 100% free space. Then _bt_getbuf could request for a 50% free page a few times, check if they're recycleable (ie: check _bt_page_recyclable), and essentially do microvacuum on that page, and if it cannot find a recycleable page, then try again with 100% recycleable ones. The code is almost there, only thing missing is the distinction between "maybe recycleable" and "surely recycleable" pages in the index FSM. Take this with a grain of salt, I'm not an expert on that code. But it seems feasible to me. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers