On Tue, Jun 17, 2014 at 9:46 PM, Ke-fei Lin <k...@kfei.net> wrote:
> 2014-06-18 1:28 GMT+08:00 Gregory Farnum <g...@inktank.com>:
>> On Tue, Jun 17, 2014 at 3:22 AM, Ke-fei Lin <k...@kfei.net> wrote:
>>> Hi list,
>>>
>>> How does RADOS check an object and its replica are consistent? Is there
>>> a checksum in object's metadata or some other mechanisms? Does the
>>> mechanism depend on OSD's underlying file system?
>>
>> It does not check consistency on read. On scrub it compares the local
>> FS metadata (size et al) and RADOS metadata (object versions and
>> things); on deep scrub it computes a checksum of each replica and
>> compares them.
> Thank you Greg.
> Let's say if there are an object A and its replica B. On deep scrubbing RADOS
> find that two objects have different checksums. How does RADOS determine
> and repair the corrupted object?

You have to explicitly trigger a scrub "repair". Right now, whatever
the primary has wins; that's obviously suboptimal. (So generally you
should try and get manually involved with repairs.)

>> RADOS does not maintain checksums alongside the objects in replicated pools.
>>
>>> And what would happen if a corrupted object being readed (like a
>>> corrupted block in traditional file system)?
>>
>> If the local filesystem doesn't return an error, it will return the
>> data it was given to the end user. (btrfs maintains its own checksums
> This sounds kind of dangerous. I think corrupted objects will be normal 
> instead
> of exception because we usually build up Ceph cluster by commodity hardware.
>> and will return errors, but unfortunately xfs will not.)
> And it seems there are lots of people still using XFS...
> By the way, is this the main reason that Ceph officially suggests btrfs?

Well, we officially suggest XFS for other reasons, but it is why our
long-term vision is to run on btrfs.
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to