On Wed, Dec 12, 2018 at 6:08 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > Well, if *you're* willing to entertain that possiblity, I'm on board. > That would certainly lead to a much simpler, and probably back-patchable, > fix.
I think we should, then. Simple is good. Just thinking about this a bit, the problem with truncating first and then writing the WAL record is that if the WAL record never makes it to disk, any physical standbys will end up out of sync with the master, leading to disaster. But the problem with writing the WAL record first is that the actual operation might fail, and then standbys will end up out of sync with the master, leading to disaster. The obvious way to finesse that latter problem is just PANIC if ftruncate() fails -- then we'll crash restart and retry, and if we still can't do it, well, the DBA will have to fix that before the system can come on line. I'm not sure that's really all that bad -- if we can't truncate, we're kinda hosed. How, other than a permissions problem, does that even happen? Your sketch upthread tries to fix it another way -- write a second record that says essentially "never mind". But that leads to the master and the standby not really being in quite equivalent states. I'm not sure whether that's really OK. If any future operation on the master depends on some aspects of the page state that wasn't recreated exactly on the standby, then replay will run into trouble. I wonder if we could get away with defining a truncation event as setting all pages beyond the truncation point to all-zeroes, with the number of those pages that actually exist at the filesystem level as an accidental detail. So if the master can't ftruncate(), it's also OK if it just zeroes all the buffers beyond that point. But once it emits the WAL record, it must do one or the other, or else PANIC. The standby has the same options. > > Truncating relations isn't that common of an > > operation, and also, we could mitigate the impacts by having the scan > > that identifies the truncation point also write any dirty buffers > > after that point. We'd have to recheck after upgrading our relation > > lock, but odds are good that in the normal case we wouldn't add much > > to the time when we hold the stronger lock. > > Hm, not quite following this? We have to lock out writers before we > try to identify the truncation point. I thought we made a tentative identification of the truncation point, upgrade the lock, and then rechecked. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company