On Tue, Aug 21, 2018 at 4:10 PM Alexander Korotkov <a.korot...@postgrespro.ru> wrote: > After reading [1] and [2] I got that there are at least 3 different > issues with heap truncation: > 1) Data corruption on file truncation error (explained in [1]). > 2) Expensive scanning of the whole shared buffers before file truncation. > 3) Cancel of read-only queries on standby even if hot_standby_feedback > is on, caused by replication of AccessExclusiveLock. > > It seems that fixing any of those issues requires redesign of heap > truncation. So, ideally redesign of heap truncation should fix all > the issues of above. Or at least it should be understood how the rest > of issues can be fixed later using the new design. > > I would like to share some my sketchy thoughts about new heap > truncation design. Let's imagine we introduced dirty_barrier buffer > flag, which prevents dirty buffer from being written (and > correspondingly evicted). Then truncation algorithm could look like > this: > > 1) Acquire ExclusiveLock on relation. > 2) Calculate truncation point using count_nondeletable_pages(), while > simultaneously placing dirty_barrier flag on dirty buffers and saving > their numbers to array. Assuming no writes are performing > concurrently, no to-be-truncated-away pages should be written from > this point. > 3) Truncate data files. > 4) Iterate past truncation point buffers and clean dirty and > dirty_barrier flags from them (using numbers we saved to array on step > #2). > 5) Release relation lock. > *) On exception happen after step #2, iterate past truncation point > buffers and clean dirty_barrier flags from them (using numbers we > saved to array on step #2) > > After heap truncation using this algorithm, shared buffers may contain > past-OEF buffers. But those buffers are empty (no used items) and > clean. So, real-only queries shouldn't hint those buffers dirty > because there are no used items. Normally, these buffers will be just > evicted away from the shared buffer arena. If relation extension will > happen short after heap truncation then some of those buffers could be > found after relation extension. I think this situation could be > handled. For instance, we can teach vacuum to claim page as new once > all the tuples were gone. > > We're taking only exclusive lock here. And assuming we will teach our > scans to treat page-past-OEF situation as no-visible-tuples-found, > concurrent read-only queries will work concurrently with heap > truncate. Also we don't have to scan whole shared buffers, only past > truncation point buffers are scanned at step #2. Later flags are > cleaned only from truncation point dirty buffers. Data corruption on > truncation error also shouldn't happen as well, because we don't > forget to write any dirty buffers before insure that data files were > successfully truncated. > > The problem I see with this approach so far is placing too many > dirty_barrier flags can affect concurrent activity. In order to cope > that we may, for instance, truncate relation in multiple iterations > when we find too many past truncation point dirty buffers. > > Any thoughts?
Given I've no feedback on this idea yet, I'll try to implement a PoC patch for that. It doesn't look to be difficult. And we'll see how does it work. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company