On Wed, 18 Mar 2009, Joerg Schilling wrote:

The problem in this case is not whether rename() is atomic but whether the
file that replaces the old file in an atomic rename() operation is in a
stable state on the disk before calling rename().

This topic is quite disturbing to me ...

The calling sequence of the failing code was:

f = open("new", O_WRONLY|O_CREATE|O_TRUNC, 0666);
write(f, "dat", size);
close(f);
rename("new", "old");

The only granted way to have the file "new" in a stable state on the disk
is to call:

f = open("new", O_WRONLY|O_CREATE|O_TRUNC, 0666);
write(f, "dat", size);
fsync(f);
close(f);

But the problem is not that the file "new" is in an unstable state. The problem is that it seems that some filesystems are not preserving the ordering of requests. Failing to preserve the ordering of requests is fraught with peril.

POSIX does not care about "disks" or "filesystems". The only correct behavior is for operations to be applied in the order that they are requested of the operating system. This is a core function of any operating system. It is therefore ok for some (or all) of the data which was written to "new" to be lost, or for the rename operation to be lost, but it is not ok for the rename to end up with a corrupted file with the new name.

In summary, I don't agree with you that the misbehavior is correct, but I do agree that copious expensive fsync()s should be assured to work around the problem.

As it happens, current versions of my own application should be safe from this Linux filesystem bug, but older versions are not. There is even a way to request fsync() on every file close, but that could be quite expensive so it is not the default.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to