On 9 April 2018 at 18:50, Anthony Iliopoulos <ail...@altatus.com> wrote:
> > There is a clear responsibility of the application to keep > its buffers around until a successful fsync(). The kernels > do report the error (albeit with all the complexities of > dealing with the interface), at which point the application > may not assume that the write()s where ever even buffered > in the kernel page cache in the first place. > > What you seem to be asking for is the capability of dropping > buffers over the (kernel) fence and idemnifying the application > from any further responsibility, i.e. a hard assurance > that either the kernel will persist the pages or it will > keep them around till the application recovers them > asynchronously, the filesystem is unmounted, or the system > is rebooted. > That's what Pg appears to assume now, yes. Whether that's reasonable is a whole different topic. I'd like a middle ground where the kernel lets us register our interest and tells us if it lost something, without us having to keep eight million FDs open for some long period. "Tell us about anything that happens under pgdata/" or an inotify-style per-directory-registration option. I'd even say that's ideal. In the mean time, I propose that we fsync() on close() before we age FDs out of the LRU on backends. Yes, that will hurt throughput and cause stalls, but we don't seem to have many better options. At least it'll only flush what we actually wrote to the OS buffers not what we may have in shared_buffers. If the bgwriter does the same thing, we should be 100% safe from this problem on 4.13+, and it'd be trivial to make it a GUC much like the fsync or full_page_writes options that people can turn off if they know the risks / know their storage is safe / don't care. Some keen person who wants to later could optimise it by adding a fsync worker thread pool in backends, so we don't block the main thread. Frankly that might be a nice thing to have in the checkpointer anyway. But it's out of scope for fixing this in durability terms. I'm partway through a patch that makes fsync panic on errors now. Once that's done, the next step will be to force fsync on close() in md and see how we go with that. Thoughts? -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services