On Tue, Dec 11, 2018 at 5:39 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > We got another report today [1] that seems to be due to the problem > we've seen before with failed vacuum truncations leaving corrupt state > on-disk [2]. Reflecting on that some more, it seems to me that we're > never going to get to a solution that everybody finds acceptable without > some rather significant restructuring at the buffer-access level. > Since looking for a back-patchable solution has yielded no progress in > eight years, what if we just accept that we will only fix this in HEAD, > and think outside the box about how we could fix it if we're willing > to change internal APIs as much as necessary?
+1. > 9. If actual truncation boundary was different from plan, issue another > WAL record saying "oh, we only managed to truncate to here, not there". I don't entirely understand how this fix addresses the problems in this area, but this step sounds particularly scary. Nothing guarantees that the second WAL record ever gets replayed. > * "Only managed to truncate to here" record: write out empty heap > pages to fill the space from original truncation target to actual. > This restores the on-disk situation to be equivalent to what it > was in master, assuming all the dirty pages eventually got written. This is equivalent only in a fairly loose sense, right? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company