On Tue, Jun 17, 2014 at 9:46 PM, Ke-fei Lin <k...@kfei.net> wrote: > 2014-06-18 1:28 GMT+08:00 Gregory Farnum <g...@inktank.com>: >> On Tue, Jun 17, 2014 at 3:22 AM, Ke-fei Lin <k...@kfei.net> wrote: >>> Hi list, >>> >>> How does RADOS check an object and its replica are consistent? Is there >>> a checksum in object's metadata or some other mechanisms? Does the >>> mechanism depend on OSD's underlying file system? >> >> It does not check consistency on read. On scrub it compares the local >> FS metadata (size et al) and RADOS metadata (object versions and >> things); on deep scrub it computes a checksum of each replica and >> compares them. > Thank you Greg. > Let's say if there are an object A and its replica B. On deep scrubbing RADOS > find that two objects have different checksums. How does RADOS determine > and repair the corrupted object?
You have to explicitly trigger a scrub "repair". Right now, whatever the primary has wins; that's obviously suboptimal. (So generally you should try and get manually involved with repairs.) >> RADOS does not maintain checksums alongside the objects in replicated pools. >> >>> And what would happen if a corrupted object being readed (like a >>> corrupted block in traditional file system)? >> >> If the local filesystem doesn't return an error, it will return the >> data it was given to the end user. (btrfs maintains its own checksums > This sounds kind of dangerous. I think corrupted objects will be normal > instead > of exception because we usually build up Ceph cluster by commodity hardware. >> and will return errors, but unfortunately xfs will not.) > And it seems there are lots of people still using XFS... > By the way, is this the main reason that Ceph officially suggests btrfs? Well, we officially suggest XFS for other reasons, but it is why our long-term vision is to run on btrfs. -Greg _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com