On Thu, Mar 29, 2018 at 2:07 PM, Thomas Munro <thomas.mu...@enterprisedb.com> wrote: > I found your discussion with kernel hacker Jeff Layton at > https://lwn.net/Articles/718734/ in which he said: "The stackoverflow > writeup seems to want a scheme where pages stay dirty after a > writeback failure so that we can try to fsync them again. Note that > that has never been the case in Linux after hard writeback failures, > AFAIK, so programs should definitely not assume that behavior."
And a bit below in the same comments, to this question about PG: "So, what are the options at this point? The assumption was that we can repeat the fsync (which as you point out is not the case), or shut down the database and perform recovery from WAL", the same Jeff Layton seems to agree PANIC is the appropriate response: "Replaying the WAL synchronously sounds like the simplest approach when you get an error on fsync. These are uncommon occurrences for the most part, so having to fall back to slow, synchronous error recovery modes when this occurs is probably what you want to do.". And right after, he confirms the errseq_t patches are about always detecting this, not more: "The main thing I working on is to better guarantee is that you actually get an error when this occurs rather than silently corrupting your data. The circumstances where that can occur require some corner-cases, but I think we need to make sure that it doesn't occur." Jeff's comments in the pull request that merged errseq_t are worth reading as well: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=088737f44bbf6378745f5b57b035e57ee3dc4750 > The article above that says the same thing a couple of different ways, > ie that writeback failure leaves you with pages that are neither > written to disk successfully nor marked dirty. > > If I'm reading various articles correctly, the situation was even > worse before his errseq_t stuff landed. That fixed cases of > completely unreported writeback failures due to sharing of PG_error > for both writeback and read errors with certain filesystems, but it > doesn't address the clean pages problem. Indeed, that's exactly how I read it as well (opinion formed independently before reading your sentence above). The errseq_t patches landed in v4.13 by the way, so very recently. > Yeah, I see why you want to PANIC. Indeed. Even doing that leaves question marks about all the kernel versions before v4.13, which at this point is pretty much everything out there, not even detecting this reliably. This is messy.