On Wed, Apr 4, 2018 at 01:54:50PM +1200, Thomas Munro wrote: > On Wed, Apr 4, 2018 at 12:56 PM, Bruce Momjian <br...@momjian.us> wrote: > > There has been a lot of focus in this thread on the workflow: > > > > write() -> blocks remain in kernel memory -> fsync() -> panic? > > > > But what happens in this workflow: > > > > write() -> kernel syncs blocks to storage -> fsync() > > > > Is fsync() going to see a "kernel syncs blocks to storage" failure? > > > > There was already discussion that if the fsync() causes the "syncs > > blocks to storage", fsync() will only report the failure once, but will > > it see any failure in the second workflow? There is indication that a > > failed write to storage reports back an error once and clears the dirty > > flag, but do we know it keeps things around long enough to report an > > error to a future fsync()? > > > > You would think it does, but I have to ask since our fsync() assumptions > > have been wrong for so long. > > I believe there were some problems of that nature (with various > twists, based on other concurrent activity and possibly different > fds), and those problems were fixed by the errseq_t system developed > by Jeff Layton in Linux 4.13. Call that "bug #1".
So all our non-cutting-edge Linux systems are vulnerable and there is no workaround Postgres can implement? Wow. > The second issues is that the pages are marked clean after the error > is reported, so further attempts to fsync() the data (in our case for > a new attempt to checkpoint) will be futile but appear successful. > Call that "bug #2", with the proviso that some people apparently think > it's reasonable behaviour and not a bug. At least there is a > plausible workaround for that: namely the nuclear option proposed by > Craig. Yes, that one I understood. -- Bruce Momjian <br...@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +