On Wed, Apr 18, 2018 at 11:49 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > I wrote: > > Relation truncation throws away the page image in memory without ever > > writing it to disk. Then, if the subsequent file truncate step fails, > > we have a problem, because anyone who goes looking for that page will > > fetch it afresh from disk and see the tuples as live. > > > There are WAL entries recording the row deletions, but that doesn't > > help unless we crash and replay the WAL. > > > It's hard to see a way around this that isn't fairly catastrophic for > > performance :-(. > > Just to throw out a possibly-crazy idea: maybe we could fix this by > PANIC'ing if truncation fails, so that we replay the row deletions from > WAL. Obviously this would be intolerable if the case were frequent, > but we've had only two such complaints in the last nine years, so maybe > it's tolerable. It seems more attractive than taking a large performance > hit on truncation speed in normal cases, anyway.
We have only two complaints of data corruption in nine years. But I suspect that in vast majority of cases truncation error didn't cause the corruption OR the corruption wasn't noticed. So, once we introduce PANIC here, we would get way more complaints. > A gotcha to be concerned about is what happens if we replay from WAL, > come to the XLOG_SMGR_TRUNCATE WAL record, and get the same truncation > failure again, which is surely not unlikely. PANIC'ing again will not > do. I think we could probably handle that by having the replay code > path zero out all the pages it was unable to delete; as long as that > succeeds, we can call it good and move on. > > Or maybe just do that in the mainline case too? That is, if ftruncate > fails, handle it by zeroing the undeletable pages and pressing on? I've just started really digging into this set of problems. But this idea looks good for me so soon... ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company