Actually, this looks very much like my issue, so I'll add to that:  
http://tracker.ceph.com/issues/21040

-----Original Message-----
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Edward 
R Huyer
Sent: Wednesday, August 23, 2017 11:10 AM
To: Brad Hubbard <bhubb...@redhat.com>
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] PG reported as inconsistent in status, but no 
inconsistencies visible to rados

Forgot to send to the list with the first reply.

I'm honestly not exactly sure when it happened.  I hadn't looked at ceph status 
in several days prior to discovering the issue and submitting to the mailing 
list.  I've seen one or two inconsistent pg issues randomly crop up in the 
month or so since these nodes were spun up, but nothing I couldn't resolve.

There was an issue with one of the Proxmox VE nodes that store VM data in the 
ceph cluster.  A network driver issue that caused the NIC to be disabled.  That 
was a week or two ago, and has since been resolved.  While the problematic PG 
is in the pool used by Proxmox, I wouldn't expect the above problem would be 
able to cause store-level corruption on the OSDs.

Other than that, nothing of interest has happened that I'm aware of, though I 
don't yet have good monitoring on these nodes.

I'll put something in the tracker later today.

Thank you for your help.

-----Original Message-----
From: Brad Hubbard [mailto:bhubb...@redhat.com]
Sent: Wednesday, August 23, 2017 4:44 AM
To: Edward R Huyer <erh...@rit.edu>
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] PG reported as inconsistent in status, but no 
inconsistencies visible to rados

On Wed, Aug 23, 2017 at 12:47 AM, Edward R Huyer <erh...@rit.edu> wrote:
> Neat, hadn't seen that command before.  Here's the fsck log from the 
> primary OSD:  https://pastebin.com/nZ0H5ag3
>
> Looks like the OSD's bluestore "filesystem" itself has some underlying 
> errors, though I'm not sure what to do about them.

Hmmm... Can you tell us any more about how/when this happened?

Any corresponding event at all? Any interesting log entries around the same 
time?

Could you also open a tracker for this (or let me know and I can open one for 
you)? That way we can continue the investigation there.

>
> -----Original Message-----
> From: Brad Hubbard [mailto:bhubb...@redhat.com]
> Sent: Monday, August 21, 2017 7:05 PM
> To: Edward R Huyer <erh...@rit.edu>
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] PG reported as inconsistent in status, but 
> no inconsistencies visible to rados
>
> Could you provide the output of 'ceph-bluestore-tool fsck' for one of these 
> OSDs?
>
> On Tue, Aug 22, 2017 at 2:53 AM, Edward R Huyer <erh...@rit.edu> wrote:
>> This is an odd one.  My cluster is reporting an inconsistent pg in 
>> ceph status and ceph health detail.  However, rados 
>> list-inconsistent-obj and rados list-inconsistent-snapset both report 
>> no inconsistencies.  Scrubbing the pg results in these errors in the osd 
>> logs:
>>
>>
>>
>> OSD 63 (primary):
>>
>> 2017-08-21 12:41:03.580068 7f0b36629700 -1
>> bluestore(/var/lib/ceph/osd/ceph-63) _verify_csum bad crc32c/0x1000 
>> checksum at blob offset 0x0, got 0x6b6b9184, expected 0x6706be76, 
>> device location [0x23f39d0000~1000], logical extent 0x0~1000, object 
>> #9:55bf7cc6:::rbd_data.33992ae8944a.000000000000200f:e#
>>
>> 2017-08-21 12:41:03.961945 7f0b36629700 -1 log_channel(cluster) log [ERR] :
>> 9.aa soid 9:55bf7cc6:::rbd_data.33992ae8944a.000000000000200f:e:
>> failed to pick suitable object info
>>
>> 2017-08-21 12:41:15.357484 7f0b36629700 -1 log_channel(cluster) log [ERR] :
>> 9.aa deep-scrub 3 errors
>>
>>
>>
>> OSD 50:
>>
>> 2017-08-21 12:41:03.592918 7f264be6d700 -1
>> bluestore(/var/lib/ceph/osd/ceph-50) _verify_csum bad crc32c/0x1000 
>> checksum at blob offset 0x0, got 0x64a1e2b1, expected 0x6706be76, 
>> device location [0x3418830000~1000], logical extent 0x0~1000, object 
>> #9:55bf7cc6:::rbd_data.33992ae8944a.000000000000200f:e#
>>
>>
>>
>> OSD 46:
>>
>> 2017-08-21 12:41:03.531394 7fb396b1f700 -1
>> bluestore(/var/lib/ceph/osd/ceph-46) _verify_csum bad crc32c/0x1000 
>> checksum at blob offset 0x0, got 0x7aa05c01, expected 0x6706be76, 
>> device location [0x1d6e1e0000~1000], logical extent 0x0~1000, object 
>> #9:55bf7cc6:::rbd_data.33992ae8944a.000000000000200f:e#
>>
>>
>>
>> This is on Ceph 12.1.4 (previously 12.1.1).
>>
>>
>>
>> Thoughts?
>>
>>
>>
>> -----
>>
>> Edward Huyer
>>
>> School of Interactive Games and Media
>>
>> Rochester Institute of Technology
>>
>> Golisano 70-2373
>>
>> 152 Lomb Memorial Drive
>>
>> Rochester, NY 14623
>>
>> 585-475-6651
>>
>> erh...@rit.edu
>>
>>
>>
>> Obligatory Legalese:
>>
>> The information transmitted, including attachments, is intended only 
>> for the
>> person(s) or entity to which it is addressed and may contain 
>> confidential and/or privileged material. Any review, retransmission, 
>> dissemination or other use of, or taking of any action in reliance 
>> upon this information by persons or entities other than the intended 
>> recipient is prohibited. If you received this in error, please 
>> contact the sender and destroy any copies of this information.
>>
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> --
> Cheers,
> Brad



--
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to