Am 10.01.19 um 16:53 schrieb Jason Dillaman: > On Thu, Jan 10, 2019 at 10:50 AM Oliver Freyermuth > <freyerm...@physik.uni-bonn.de> wrote: >> >> Dear Jason and list, >> >> Am 10.01.19 um 16:28 schrieb Jason Dillaman: >>> On Thu, Jan 10, 2019 at 4:01 AM Oliver Freyermuth >>> <freyerm...@physik.uni-bonn.de> wrote: >>>> >>>> Dear Cephalopodians, >>>> >>>> I performed several consistency checks now: >>>> - Exporting an RBD snapshot before and after the object map rebuilding. >>>> - Exporting a backup as raw image, all backups (re)created before and >>>> after the object map rebuilding. >>>> - md5summing all of that for a snapshot for which the rebuilding was >>>> actually needed. >>>> >>>> The good news: I found that all checksums are the same. So the backups are >>>> (at least for those I checked) not broken. >>>> >>>> I also checked the source and found: >>>> https://github.com/ceph/ceph/blob/master/src/include/rbd/object_map_types.h >>>> So to my understanding, the object map entries are OBJECT_EXISTS, but >>>> should be OBJECT_EXISTS_CLEAN. >>>> Do I understand correctly that OBJECT_EXISTS_CLEAN relates to the object >>>> being unchanged ("clean") as compared to another snapshot / the main >>>> volume? >>>> >>>> If so, this would explain why the backups, exports etc. are all okay, >>>> since the backup tools only got "too many" objects in the fast-diff and >>>> hence extracted too many objects from Ceph-RBD even though that was not >>>> needed. Since both Benji and Backy2 deduplicate again in their backends, >>>> this causes only a minor network traffic inefficiency. >>>> >>>> Is my understanding correct? >>>> Then the underlying issue would still be a bug, but (as it seems) a >>>> harmless one. >>> >>> Yes, your understanding is correct in that it's harmless from a >>> data-integrity point-of-view. >>> >>> During the creation of the snapshot, the current object map (for the >>> HEAD revision) is copied to a new object map for that snapshot and >>> then all the objects in the HEAD revision snapshot are marked as >>> EXISTS_CLEAN (if they EXIST). Somehow an IO operation is causing the >>> object map to think there is an update, but apparently no object >>> update is actually occurring (or at least the OSD doesn't think a >>> change occurred). >> >> thanks a lot for the clarification! Good to know my understanding is correct. >> >> I re-checked all object maps just now. Again, the most recent snapshots show >> this issue, but only those. >> The only "special" thing which probably not everybody is doing would likely >> be us running fstrim in the machines >> running from the RBD regularly, to conserve space. >> >> I am not sure how exactly the DISCARD operation is handled in rbd. But since >> this was my guess, I just did an fstrim inside one of the VMs, >> and checked the object-maps again. I get: >> 2019-01-10 16:44:25.320 7f06f67fc700 -1 librbd::ObjectMapIterateRequest: >> object map error: object rbd_data.4f587327b23c6.0000000000000040 marked as >> 1, but should be 3 >> In this case, I got it for the volume itself and not a snapshot. >> >> So it seems to me that sometimes, DISCARD causes objects to think they have >> been updated, albeit they have not. >> Sadly due to in-depth code knowledge and lack of a real debug setup I can >> not track it down further :-(. >> >> Cheers and hope that helps a code expert in tracking it down (at least it's >> not affecting data integrity), > > Thanks, that definitely provides a good investigation starting point.
Should we also put it into a ticket, so it can be tracked? I could do it if you like. On the other hand, maybe you could summarize the issue more concisely than I can. Cheers and all the best, Oliver > >> Oliver >> >>> >>>> I'll let you know if it happens again to some of our snapshots, and if so, >>>> if it only happens to newly created ones... >>>> >>>> Cheers, >>>> Oliver >>>> >>>> Am 10.01.19 um 01:18 schrieb Oliver Freyermuth: >>>>> Dear Cephalopodians, >>>>> >>>>> inspired by >>>>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-January/032092.html >>>>> I did a check of the object-maps of our RBD volumes >>>>> and snapshots. We are running 13.2.1 on the cluster I am talking about, >>>>> all hosts (OSDs, MONs, RBD client nodes) still on CentOS 7.5. >>>>> >>>>> Sadly, I found that for at least 50 % of the snapshots (only the >>>>> snapshots, not the volumes themselves), I got something like: >>>>> -------------------------------------------------------------------------------------------------- >>>>> 2019-01-09 23:00:06.481 7f89aeffd700 -1 librbd::ObjectMapIterateRequest: >>>>> object map error: object rbd_data.519c46b8b4567.0000000000000260 marked >>>>> as 1, but should be 3 >>>>> 2019-01-09 23:00:06.563 7f89aeffd700 -1 librbd::ObjectMapIterateRequest: >>>>> object map error: object rbd_data.519c46b8b4567.0000000000000840 marked >>>>> as 1, but should be 3 >>>>> -------------------------------------------------------------------------------------------------- >>>>> 2019-01-09 23:00:09.166 7fbcff7fe700 -1 librbd::ObjectMapIterateRequest: >>>>> object map error: object rbd_data.519c46b8b4567.0000000000000480 marked >>>>> as 1, but should be 3 >>>>> 2019-01-09 23:00:09.228 7fbcff7fe700 -1 librbd::ObjectMapIterateRequest: >>>>> object map error: object rbd_data.519c46b8b4567.0000000000000840 marked >>>>> as 1, but should be 3 >>>>> -------------------------------------------------------------------------------------------------- >>>>> It often appears to affect 1-3 entries in the map of a snapshot. The >>>>> Object Map was *not* marked invalid before I ran the check. >>>>> After rebuilding it, the check is fine again. >>>>> >>>>> The cluster has not yet seen any Ceph update (it was installed as 13.2.1, >>>>> we plan to upgrade to 13.2.4 soonish). >>>>> There have been no major causes of worries so far. We purged a single OSD >>>>> disk, balanced PGs with upmap, modified the CRUSH topology slightly etc. >>>>> The cluster never was in a prolonged unhealthy period nor did we have to >>>>> repair any PG. >>>>> >>>>> Is this a known error? >>>>> Is it harmful, or is this just something like reference counting being >>>>> off, and objects being in the map which did not really change in the >>>>> snapshot? >>>>> >>>>> Our usecase, in case that helps to understand or reproduce: >>>>> - RBDs are used as disks for qemu/kvm virtual machines. >>>>> - Every night: >>>>> - We run an fstrim in the VM (which propagates to RBD and purges >>>>> empty blocks), fsfreeze it, take a snapshot, thaw it again. >>>>> - After that, we run two backups with Benji backup ( >>>>> https://benji-backup.me/ ) and Backy2 backup ( http://backy2.com/docs/ ) >>>>> which seems to work rather well so far. >>>>> - We purge some old snapshots. >>>>> >>>>> We use the following RBD feature flags: >>>>> layering, exclusive-lock, object-map, fast-diff, deep-flatten >>>>> >>>>> Since Benji and Backy2 are optimized for differential RBD backups to >>>>> deduplicated storage, they leverage "rbd diff" (and hence make use of >>>>> fast-diff, I would think). >>>>> If rbd diff produces wrong output due to this issue, it would affect our >>>>> backups (but it would also affect classic backups of snapshots via "rbd >>>>> export"...). >>>>> In case the issue is known or understood, can somebody extrapolate >>>>> whether this means "rbd diff" contains too many blocks or actually misses >>>>> changed blocks? >>>>> >>>>> >>>>> We are from now on running daily, full object-map checks on all volumes >>>>> and backups, and automatically rebuild any object-map which was found >>>>> invalid after the check. >>>>> Hopefully, this will allow to correlate the appearance of these issues >>>>> with "something" happening on the cluster. >>>>> I did not detect a clean pattern in the affected snapshots, though, it >>>>> seemed rather random... >>>>> >>>>> Maybe it would also help to understand this issue if somebody else using >>>>> RBD in a similar manner on Mimic could also check the object-maps. >>>>> Since this issue does not show up until a check is performed, this was >>>>> below our radar for many months now... >>>>> >>>>> Cheers, >>>>> Oliver >>>>> >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> >>> >>
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com