Re: Sketch of a fix for that truncation data corruption issue

Robert Haas Mon, 10 Dec 2018 21:29:20 -0800

On Tue, Dec 11, 2018 at 5:39 AM Tom Lane <t...@sss.pgh.pa.us> wrote:
> We got another report today [1] that seems to be due to the problem
> we've seen before with failed vacuum truncations leaving corrupt state
> on-disk [2].  Reflecting on that some more, it seems to me that we're
> never going to get to a solution that everybody finds acceptable without
> some rather significant restructuring at the buffer-access level.
> Since looking for a back-patchable solution has yielded no progress in
> eight years, what if we just accept that we will only fix this in HEAD,
> and think outside the box about how we could fix it if we're willing
> to change internal APIs as much as necessary?


+1.

> 9. If actual truncation boundary was different from plan, issue another
> WAL record saying "oh, we only managed to truncate to here, not there".

I don't entirely understand how this fix addresses the problems in
this area, but this step sounds particularly scary.  Nothing
guarantees that the second WAL record ever gets replayed.

> * "Only managed to truncate to here" record: write out empty heap
> pages to fill the space from original truncation target to actual.
> This restores the on-disk situation to be equivalent to what it
> was in master, assuming all the dirty pages eventually got written.

This is equivalent only in a fairly loose sense, right?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Sketch of a fix for that truncation data corruption issue

Reply via email to