Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

Craig Ringer Mon, 09 Apr 2018 05:18:13 -0700

On 9 April 2018 at 18:50, Anthony Iliopoulos <ail...@altatus.com> wrote:


>
> There is a clear responsibility of the application to keep
> its buffers around until a successful fsync(). The kernels
> do report the error (albeit with all the complexities of
> dealing with the interface), at which point the application
> may not assume that the write()s where ever even buffered
> in the kernel page cache in the first place.
>





> What you seem to be asking for is the capability of dropping
> buffers over the (kernel) fence and idemnifying the application
> from any further responsibility, i.e. a hard assurance
> that either the kernel will persist the pages or it will
> keep them around till the application recovers them
> asynchronously, the filesystem is unmounted, or the system
> is rebooted.
>

That's what Pg appears to assume now, yes.

Whether that's reasonable is a whole different topic.

I'd like a middle ground where the kernel lets us register our interest and
tells us if it lost something, without us having to keep eight million FDs
open for some long period. "Tell us about anything that happens under
pgdata/" or an inotify-style per-directory-registration option. I'd even
say that's ideal.

In the mean time, I propose that we fsync() on close() before we age FDs
out of the LRU on backends. Yes, that will hurt throughput and cause
stalls, but we don't seem to have many better options. At least it'll only
flush what we actually wrote to the OS buffers not what we may have in
shared_buffers. If the bgwriter does the same thing, we should be 100% safe
from this problem on 4.13+, and it'd be trivial to make it a GUC much like
the fsync or full_page_writes options that people can turn off if they know
the risks / know their storage is safe / don't care.

Some keen person who wants to later could optimise it by adding a fsync
worker thread pool in backends, so we don't block the main thread. Frankly
that might be a nice thing to have in the checkpointer anyway. But it's out
of scope for fixing this in durability terms.

I'm partway through a patch that makes fsync panic on errors now. Once
that's done, the next step will be to force fsync on close() in md and see
how we go with that.

Thoughts?

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

Reply via email to