Re: [ceph-users] Inconsistent PGs caused by omap_digest mismatch

Bryan Stillwell Mon, 08 Apr 2019 16:43:29 -0700

> On Apr 8, 2019, at 4:38 PM, Gregory Farnum <[email protected]> wrote:
> 
> On Mon, Apr 8, 2019 at 3:19 PM Bryan Stillwell <[email protected]> wrote:
>> 
>> There doesn't appear to be any correlation between the OSDs which would 
>> point to a hardware issue, and since it's happening on two different 
>> clusters I'm wondering if there's a race condition that has been fixed in a 
>> later version?
>> 
>> Also, what exactly is the omap digest?  From what I can tell it appears to 
>> be some kind of checksum for the omap data.  Is that correct?
> 
> Yeah; it's just a crc over the omap key-value data that's checked
> during deep scrub. Same as the data digest.
> 
> I've not noticed any issues around this in Luminous but I probably
> wouldn't have, so will have to leave it up to others if there are
> fixes in since 12.2.8.


Thanks for adding some clarity to that Greg!

For some added information, this is what the logs reported earlier today:

2019-04-08 11:46:15.610169 osd.504 osd.504 10.16.10.30:6804/8874 33 : cluster 
[ERR] 7.3 : soid 7:c09d46a1:::.dir.default.22333615.1861352:head omap_digest 
0x26a1241b != omap_digest 0x4c10ee76 from shard 504
2019-04-08 11:46:15.610190 osd.504 osd.504 10.16.10.30:6804/8874 34 : cluster 
[ERR] 7.3 : soid 7:c09d46a1:::.dir.default.22333615.1861352:head omap_digest 
0x26a1241b != omap_digest 0x4c10ee76 from shard 504

I then tried deep scrubbing it again to see if the data was fine, but the 
digest calculation was just having problems.  It came back with the same 
problem with new digest values:

2019-04-08 15:56:21.186291 osd.504 osd.504 10.16.10.30:6804/8874 49 : cluster 
[ERR] 7.3 : soid 7:c09d46a1:::.dir.default.22333615.1861352:head omap_digest 
0x93bac8f != omap_digest 0 xab1b9c6f from shard 504
2019-04-08 15:56:21.186313 osd.504 osd.504 10.16.10.30:6804/8874 50 : cluster 
[ERR] 7.3 : soid 7:c09d46a1:::.dir.default.22333615.1861352:head omap_digest 
0x93bac8f != omap_digest 0 xab1b9c6f from shard 504

Which makes sense, but doesn’t explain why the omap data is getting out of sync 
across multiple OSDs and clusters…

I’ll see what I can figure out tomorrow, but if anyone else has some hints I 
would love to hear them.

Thanks,
Bryan
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Inconsistent PGs caused by omap_digest mismatch

Reply via email to