Paul Eggert wrote:
Feature creep is something we should avoid. Here, though, it's a real
pain to synchronize correctly and many people will get it wrong.
The problem is that it is impossible to get it right unless one does
something as extreme as unmounting-then-remounting the filesystem, or
even making a backup copy on a removable device and then verifying the
copy on a different computer.
Sure, as some file systems do not support fsync. Still, gzip should do
what it can.
I am not so sure. There seems to be an arms race between tools that do
what they can and ways of preventing those tools from doing it. Just
search for "disable fsync".
Even if invoked optionally, all this complication to perhaps achieve
nothing but a false sense of safety goes against my KISS philosophy.
Imagine if some backup tool begins calling 'gzip --synchronous', and
users are forced to install libeatmydata to disable it.
it may become an endless source of bug reports.
I doubt it. gzip has run unsafely for decades, and this is the first
bug report about it -- one discovered by code inspection, not by actual
failure.
Publish or perish may be the cause of such an endless source of bug
reports. The next one might be titled "On why 'gzip --synchronous' does
not work on some filesystems".
(fsyncing also the destination's directory,
Yes, that needs fixing. Done in the patches I just now emailed to you.
Thanks. But I was not asking for a fix. Just pointing out what others
might ask.
True, fsync is a bad design. But that is no excuse for gzip losing data.
As I see it, it is not gzip the one losing the data, but the filesystem
that does not respect the write order, or even the user that chose such
filesystem (perhaps because of a good reason).
1) gzip --keep file # don't delete input
2) sync # commit output and directory to disk
3) zcmp file file.gz # verify output
4) rm file # then remove input
That approach does not suffice, because 'sync' does not guarantee that
the output data has been synchronized to disk.
I know, but how can you guarantee that 'gzip --synchronous' will work on
a system where the 'sync' above does not even guarantee that 'file.gz'
is written to disk before 'file' is deleted?
I still think that the right thing to do is to not implement any kind of
fsync functionality in gzip/lzip, and achieve permanence (when it is
needed) by some other means. As you said, gzip has run unsafely for
decades without a failure.
Best regards,
Antonio.