On 8 April 2018 at 04:27, Craig Ringer <cr...@2ndquadrant.com> wrote: > On 8 April 2018 at 10:16, Thomas Munro <thomas.mu...@enterprisedb.com> > wrote: > > If the kernel does writeback in the middle, how on earth is it supposed to > know we expect to reopen the file and check back later? > > Should it just remember "this file had an error" forever, and tell every > caller? In that case how could we recover? We'd need some new API to say > "yeah, ok already, I'm redoing all my work since the last good fsync() so > you can clear the error flag now". Otherwise it'd keep reporting an error > after we did redo to recover, too.
There is no spoon^H^H^H^H^Herror flag. We don't need fsync to keep track of any errors. We just need fsync to accurately report whether all the buffers in the file have been written out. When you call fsync again the kernel needs to initiate i/o on all the dirty buffers and block until they complete successfully. If they complete successfully then nobody cares whether they had some failure in the past when i/o was initiated at some point in the past. The problem is not that errors aren't been tracked correctly. The problem is that dirty buffers are being marked clean when they haven't been written out. They consider dirty filesystem buffers when there's hardware failure preventing them from being written "a memory leak". As long as any error means the kernel has discarded writes then there's no real hope of any reliable operation through that interface. Going to DIRECTIO is basically recognizing this. That the kernel filesystem buffer provides no reliable interface so we need to reimplement it ourselves in user space. It's rather disheartening. Aside from having to do all that work we have the added barrier that we don't have as much information about the hardware as the kernel has. We don't know where raid stripes begin and end, how big the memory controller buffers are or how to tell when they're full or empty or how to flush them. etc etc. We also don't know what else is going on on the machine. -- greg