Re: [ceph-users] pg inconsistent, scrub stat mismatch on bytes

David Turner Wed, 20 Jun 2018 10:57:15 -0700

As a part of the repair operation it runs a deep-scrub on the PG.  If it
showed active+clean after the repair and deep-scrub finished, then the next
run of a scrub on the PG shouldn't change the PG status at all.


On Wed, Jun 6, 2018 at 8:57 PM Adrian <aussie...@gmail.com> wrote:

> Update to this.
>
> The affected pg didn't seem inconsistent:
>
> [root@admin-ceph1-qh2 ~]# ceph health detail
>
> HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent
> OSD_SCRUB_ERRORS 1 scrub errors
> PG_DAMAGED Possible data damage: 1 pg inconsistent
>    pg 6.20 is active+clean+inconsistent, acting [114,26,44]
> [root@admin-ceph1-qh2 ~]# rados list-inconsistent-obj 6.20
> --format=json-pretty
> {
>    "epoch": 210034,
>    "inconsistents": []
> }
>
> Although pg query showed the primary info.stats.stat_sum.num_bytes
> differed from the peers
>
> A pg repair on 6.20 seems to have resolved the issue for now but the
> info.stats.stat_sum.num_bytes still differs so presumably will become
> inconsistent again next time it scrubs.
>
> Adrian.
>
> On Tue, Jun 5, 2018 at 12:09 PM, Adrian <aussie...@gmail.com> wrote:
>
>> Hi Cephers,
>>
>> We recently upgraded one of our clusters from hammer to jewel and then to
>> luminous (12.2.5, 5 mons/mgr, 21 storage nodes * 9 osd's). After some
>> deep-scubs we have an inconsistent pg with a log message we've not seen
>> before:
>>
>> HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent
>> OSD_SCRUB_ERRORS 1 scrub errors
>> PG_DAMAGED Possible data damage: 1 pg inconsistent
>>     pg 6.20 is active+clean+inconsistent, acting [114,26,44]
>>
>>
>> Ceph log shows
>>
>> 2018-06-03 06:53:35.467791 osd.114 osd.114 172.26.28.25:6825/40819 395 : 
>> cluster [ERR] 6.20 scrub stat mismatch, got 6526/6526 objects, 87/87 clones, 
>> 6526/6526 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 
>> 25952454144/25952462336 bytes, 0/0 hit_set_archive bytes.
>> 2018-06-03 06:53:35.467799 osd.114 osd.114 172.26.28.25:6825/40819 396 : 
>> cluster [ERR] 6.20 scrub 1 errors
>> 2018-06-03 06:53:40.701632 mon.mon1-ceph1-qh2 mon.0 172.26.28.8:6789/0 41298 
>> : cluster [ERR] Health check failed: 1 scrub errors (OSD_SCRUB_ERRORS)
>> 2018-06-03 06:53:40.701668 mon.mon1-ceph1-qh2 mon.0 172.26.28.8:6789/0 41299 
>> : cluster [ERR] Health check failed: Possible data damage: 1 pg inconsistent 
>> (PG_DAMAGED)
>> 2018-06-03 07:00:00.000137 mon.mon1-ceph1-qh2 mon.0 172.26.28.8:6789/0 41345 
>> : cluster [ERR] overall HEALTH_ERR 1 scrub errors; Possible data damage: 1 
>> pg inconsistent
>>
>> There are no EC pools - looks like it may be the same as
>> https://tracker.ceph.com/issues/22656 although as in #7 this is not a
>> cache pool.
>>
>> Wondering if this is ok to issue a pg repair on 6.20 or if there's
>> something else we should be looking at first ?
>>
>> Thanks in advance,
>> Adrian.
>>
>> ---
>> Adrian : aussie...@gmail.com
>> If violence doesn't solve your problem, you're not using enough of it.
>>
>
>
>
> --
> ---
> Adrian : aussie...@gmail.com
> If violence doesn't solve your problem, you're not using enough of it.
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] pg inconsistent, scrub stat mismatch on bytes

Reply via email to