Re: [ceph-users] Invalid RBD object maps of snapshots on Mimic

Oliver Freyermuth Sat, 12 Jan 2019 12:34:29 -0800

Am 10.01.19 um 16:53 schrieb Jason Dillaman:
> On Thu, Jan 10, 2019 at 10:50 AM Oliver Freyermuth
> <freyerm...@physik.uni-bonn.de> wrote:
>>
>> Dear Jason and list,
>>
>> Am 10.01.19 um 16:28 schrieb Jason Dillaman:
>>> On Thu, Jan 10, 2019 at 4:01 AM Oliver Freyermuth
>>> <freyerm...@physik.uni-bonn.de> wrote:
>>>>
>>>> Dear Cephalopodians,
>>>>
>>>> I performed several consistency checks now:
>>>> - Exporting an RBD snapshot before and after the object map rebuilding.
>>>> - Exporting a backup as raw image, all backups (re)created before and 
>>>> after the object map rebuilding.
>>>> - md5summing all of that for a snapshot for which the rebuilding was 
>>>> actually needed.
>>>>
>>>> The good news: I found that all checksums are the same. So the backups are 
>>>> (at least for those I checked) not broken.
>>>>
>>>> I also checked the source and found:
>>>> https://github.com/ceph/ceph/blob/master/src/include/rbd/object_map_types.h
>>>> So to my understanding, the object map entries are OBJECT_EXISTS, but 
>>>> should be OBJECT_EXISTS_CLEAN.
>>>> Do I understand correctly that OBJECT_EXISTS_CLEAN relates to the object 
>>>> being unchanged ("clean") as compared to another snapshot / the main 
>>>> volume?
>>>>
>>>> If so, this would explain why the backups, exports etc. are all okay, 
>>>> since the backup tools only got "too many" objects in the fast-diff and
>>>> hence extracted too many objects from Ceph-RBD even though that was not 
>>>> needed. Since both Benji and Backy2 deduplicate again in their backends,
>>>> this causes only a minor network traffic inefficiency.
>>>>
>>>> Is my understanding correct?
>>>> Then the underlying issue would still be a bug, but (as it seems) a 
>>>> harmless one.
>>>
>>> Yes, your understanding is correct in that it's harmless from a
>>> data-integrity point-of-view.
>>>
>>> During the creation of the snapshot, the current object map (for the
>>> HEAD revision) is copied to a new object map for that snapshot and
>>> then all the objects in the HEAD revision snapshot are marked as
>>> EXISTS_CLEAN (if they EXIST). Somehow an IO operation is causing the
>>> object map to think there is an update, but apparently no object
>>> update is actually occurring (or at least the OSD doesn't think a
>>> change occurred).
>>
>> thanks a lot for the clarification! Good to know my understanding is correct.
>>
>> I re-checked all object maps just now. Again, the most recent snapshots show 
>> this issue, but only those.
>> The only "special" thing which probably not everybody is doing would likely 
>> be us running fstrim in the machines
>> running from the RBD regularly, to conserve space.
>>
>> I am not sure how exactly the DISCARD operation is handled in rbd. But since 
>> this was my guess, I just did an fstrim inside one of the VMs,
>> and checked the object-maps again. I get:
>> 2019-01-10 16:44:25.320 7f06f67fc700 -1 librbd::ObjectMapIterateRequest: 
>> object map error: object rbd_data.4f587327b23c6.0000000000000040 marked as 
>> 1, but should be 3
>> In this case, I got it for the volume itself and not a snapshot.
>>
>> So it seems to me that sometimes, DISCARD causes objects to think they have 
>> been updated, albeit they have not.
>> Sadly due to in-depth code knowledge and lack of a real debug setup I can 
>> not track it down further :-(.
>>
>> Cheers and hope that helps a code expert in tracking it down (at least it's 
>> not affecting data integrity),
> 
> Thanks, that definitely provides a good investigation starting point.


Should we also put it into a ticket, so it can be tracked? 
I could do it if you like. On the other hand, maybe you could summarize the 
issue more concisely than I can. 

Cheers and all the best,
        Oliver

> 
>>         Oliver
>>
>>>
>>>> I'll let you know if it happens again to some of our snapshots, and if so, 
>>>> if it only happens to newly created ones...
>>>>
>>>> Cheers,
>>>>          Oliver
>>>>
>>>> Am 10.01.19 um 01:18 schrieb Oliver Freyermuth:
>>>>> Dear Cephalopodians,
>>>>>
>>>>> inspired by 
>>>>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-January/032092.html
>>>>>  I did a check of the object-maps of our RBD volumes
>>>>> and snapshots. We are running 13.2.1 on the cluster I am talking about, 
>>>>> all hosts (OSDs, MONs, RBD client nodes) still on CentOS 7.5.
>>>>>
>>>>> Sadly, I found that for at least 50 % of the snapshots (only the 
>>>>> snapshots, not the volumes themselves), I got something like:
>>>>> --------------------------------------------------------------------------------------------------
>>>>> 2019-01-09 23:00:06.481 7f89aeffd700 -1 librbd::ObjectMapIterateRequest: 
>>>>> object map error: object rbd_data.519c46b8b4567.0000000000000260 marked 
>>>>> as 1, but should be 3
>>>>> 2019-01-09 23:00:06.563 7f89aeffd700 -1 librbd::ObjectMapIterateRequest: 
>>>>> object map error: object rbd_data.519c46b8b4567.0000000000000840 marked 
>>>>> as 1, but should be 3
>>>>> --------------------------------------------------------------------------------------------------
>>>>> 2019-01-09 23:00:09.166 7fbcff7fe700 -1 librbd::ObjectMapIterateRequest: 
>>>>> object map error: object rbd_data.519c46b8b4567.0000000000000480 marked 
>>>>> as 1, but should be 3
>>>>> 2019-01-09 23:00:09.228 7fbcff7fe700 -1 librbd::ObjectMapIterateRequest: 
>>>>> object map error: object rbd_data.519c46b8b4567.0000000000000840 marked 
>>>>> as 1, but should be 3
>>>>> --------------------------------------------------------------------------------------------------
>>>>> It often appears to affect 1-3 entries in the map of a snapshot. The 
>>>>> Object Map was *not* marked invalid before I ran the check.
>>>>> After rebuilding it, the check is fine again.
>>>>>
>>>>> The cluster has not yet seen any Ceph update (it was installed as 13.2.1, 
>>>>> we plan to upgrade to 13.2.4 soonish).
>>>>> There have been no major causes of worries so far. We purged a single OSD 
>>>>> disk, balanced PGs with upmap, modified the CRUSH topology slightly etc.
>>>>> The cluster never was in a prolonged unhealthy period nor did we have to 
>>>>> repair any PG.
>>>>>
>>>>> Is this a known error?
>>>>> Is it harmful, or is this just something like reference counting being 
>>>>> off, and objects being in the map which did not really change in the 
>>>>> snapshot?
>>>>>
>>>>> Our usecase, in case that helps to understand or reproduce:
>>>>> - RBDs are used as disks for qemu/kvm virtual machines.
>>>>> - Every night:
>>>>>     - We run an fstrim in the VM (which propagates to RBD and purges 
>>>>> empty blocks), fsfreeze it, take a snapshot, thaw it again.
>>>>>     - After that, we run two backups with Benji backup ( 
>>>>> https://benji-backup.me/ ) and Backy2 backup ( http://backy2.com/docs/ )
>>>>>       which seems to work rather well so far.
>>>>>     - We purge some old snapshots.
>>>>>
>>>>> We use the following RBD feature flags:
>>>>> layering, exclusive-lock, object-map, fast-diff, deep-flatten
>>>>>
>>>>> Since Benji and Backy2 are optimized for differential RBD backups to 
>>>>> deduplicated storage, they leverage "rbd diff" (and hence make use of 
>>>>> fast-diff, I would think).
>>>>> If rbd diff produces wrong output due to this issue, it would affect our 
>>>>> backups (but it would also affect classic backups of snapshots via "rbd 
>>>>> export"...).
>>>>> In case the issue is known or understood, can somebody extrapolate 
>>>>> whether this means "rbd diff" contains too many blocks or actually misses 
>>>>> changed blocks?
>>>>>
>>>>>
>>>>> We are from now on running daily, full object-map checks on all volumes 
>>>>> and backups, and automatically rebuild any object-map which was found 
>>>>> invalid after the check.
>>>>> Hopefully, this will allow to correlate the appearance of these issues 
>>>>> with "something" happening on the cluster.
>>>>> I did not detect a clean pattern in the affected snapshots, though, it 
>>>>> seemed rather random...
>>>>>
>>>>> Maybe it would also help to understand this issue if somebody else using 
>>>>> RBD in a similar manner on Mimic could also check the object-maps.
>>>>> Since this issue does not show up until a check is performed, this was 
>>>>> below our radar for many months now...
>>>>>
>>>>> Cheers,
>>>>>        Oliver
>>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>>

smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Invalid RBD object maps of snapshots on Mimic

Reply via email to