On Tue, Dec 23, 2014 at 4:08 PM, Holger Hoffstätte <holger.hoffstae...@googlemail.com> wrote: > On Tue, 23 Dec 2014 21:54:00 +0100, Stefan G. Weichinger wrote: > >> In the other direction: what protects against these errors you mention? > > ceph scrub :) >
Are you sure about that? I was under the impression that it just checked that everything was retrievable. I'm not sure if it compares all the copies of everything to make sure that they match, and if they don't match I don't think that it has any way to know which one is right. I believe an algorithm just picks one as the official version, and it may or may not be identical to the one that was originally stored. If the data is on btrfs then it is protected from silent corruption since the filesystem will give an error when that node tries to read a file, and presumably the cluster will find another copy elsewhere. On the other hand if the file were logically overwritten in some way above the btrfs layer then btrfs won't complain and the cluster won't realize the file has been corrupted. If I'm wrong on this by all means point me to the truth. From everything I read though I don't think that ceph maintains a list of checksums on all the data that is stored while it is at rest. -- Rich