On Wed, 2016-04-20 at 06:40 -0400, Ric Wheeler wrote: > On 04/20/2016 05:24 AM, Kevin Wolf wrote: > > > > Am 20.04.2016 um 03:56 hat Ric Wheeler geschrieben: > > > > > > On 04/19/2016 10:09 AM, Jeff Cody wrote: > > > > > > > > On Tue, Apr 19, 2016 at 08:18:39AM -0400, Ric Wheeler wrote: > > > > > > > > > > On 04/19/2016 08:07 AM, Jeff Cody wrote: > > > > > > > > > > > > Bug fixes for gluster; third patch is to prevent > > > > > > a potential data loss when trying to recover from > > > > > > a recoverable error (such as ENOSPC). > > > > > Hi Jeff, > > > > > > > > > > Just a note, I have been talking to some of the disk drive > > > > > people > > > > > here at LSF (the kernel summit for file and storage people) > > > > > and got > > > > > a non-public confirmation that individual storage devices (s- > > > > > ata > > > > > drives or scsi) can also dump cache state when a synchronize > > > > > cache > > > > > command fails. Also followed up with Rik van Riel - in the > > > > > page > > > > > cache in general, when we fail to write back dirty pages, > > > > > they are > > > > > simply marked "clean" (which means effectively that they get > > > > > dropped). > > > > > > > > > > Long winded way of saying that I think that this scenario is > > > > > not > > > > > unique to gluster - any failed fsync() to a file (or block > > > > > device) > > > > > might be an indication of permanent data loss. > > > > > > > > > Ric, > > > > > > > > Thanks. > > > > > > > > I think you are right, we likely do need to address how QEMU > > > > handles fsync > > > > failures across the board in QEMU at some point > > > > (2.7?). Another point to > > > > consider is that QEMU is cross-platform - so not only do we > > > > have different > > > > protocols, and filesystems, but also different underlying host > > > > OSes as well. > > > > It is likely, like you said, that there are other non-gluster > > > > scenarios where > > > > we have non-recoverable data loss on fsync failure. > > > > > > > > With Gluster specifically, if we look at just ENOSPC, does this > > > > mean that > > > > even if Gluster retains its cache after fsync failure, we still > > > > won't know > > > > that there was no permanent data loss? If we hit ENOSPC during > > > > an fsync, I > > > > presume that means Gluster itself may have encountered ENOSPC > > > > from a fsync to > > > > the underlying storage. In that case, does Gluster just pass > > > > the error up > > > > the stack? > > > > > > > > Jeff > > > I still worry that in many non-gluster situations we will have > > > permanent data loss here. Specifically, the way the page cache > > > works, if we fail to write back cached data *at any time*, a > > > future > > > fsync() will get a failure. > > And this is actually what saves the semantic correctness. If you > > threw > > away data, any following fsync() must fail. This is of course > > inconvenient because you won't be able to resume a VM that is > > configured > > to stop on errors, and it means some data loss, but it's safe > > because we > > never tell the guest that the data is on disk when it really isn't. > > > > gluster's behaviour (without resync-failed-syncs-after-fsync set) > > is > > different, if I understand correctly. It will throw away the data > > and > > then happily report success on the next fsync() call. And this is > > what > > causes not only data loss, but corruption. > Yes, that makes sense to me - the kernel will remember that it could > not write > data back from the page cache and the future fsync() will see an > error. > > > > > > > [ Hm, or having read what's below... Did I misunderstand and Linux > > returns failure only for a single fsync() and on the next one it > > returns success again? That would be bad. ] > I would need to think through that scenario with the memory > management people to > see if that could happen.
It could definitely happen. 1) block on disk contains contents A 2) page cache gets contents B written to it 3) fsync fails 4) page with contents B get evicted from memory 5) block with contents A gets read from disk