On Fri, Sep 27, 2019 at 3:14 PM Michael Paquier <mich...@paquier.xyz> wrote: > > On Thu, Sep 26, 2019 at 01:13:56AM +0900, Fujii Masao wrote: > > On Tue, Sep 24, 2019 at 10:41 AM Michael Paquier <mich...@paquier.xyz> > > wrote: > >> This also points out that there are other things to worry about than > >> interruptions, as for example DropRelFileNodeLocalBuffers() could lead > >> to an ERROR, and this happens before the physical truncation is done > >> but after the WAL record is replayed on the standby, so any failures > >> happening at the truncation phase before the work is done would be a > >> problem. However we are talking about failures which should not > >> happen and these are elog() calls. It would be tempting to add a > >> critical section here, but we could still have problems if we have a > >> failure after the WAL record has been flushed, which means that it > >> would be replayed on the standby, and the surrounding comments are > >> clear about that. > > > > Could you elaborate what problem adding a critical section there occurs? > > Wrapping the call of smgrtruncate() within RelationTruncate() to use a > critical section would make things worse from the user perspective on > the primary, no? If the physical truncation fails, we would still > fail WAL replay on the standby, but instead of generating an ERROR in > the session of the user attempting the TRUNCATE, the whole primary > would be taken down.
Thanks for elaborating that! Understood. But this can cause subsequent recovery to always fail with invalid-pages error and the server not to start up. This is bad. So, to allviate the situation, I'm thinking it would be worth adding something like igore_invalid_pages developer parameter. When this parameter is set to true, the startup process always ignores invalid-pages errors. Thought? Regards, -- Fujii Masao