As a part of the repair operation it runs a deep-scrub on the PG. If it showed active+clean after the repair and deep-scrub finished, then the next run of a scrub on the PG shouldn't change the PG status at all.
On Wed, Jun 6, 2018 at 8:57 PM Adrian <aussie...@gmail.com> wrote: > Update to this. > > The affected pg didn't seem inconsistent: > > [root@admin-ceph1-qh2 ~]# ceph health detail > > HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent > OSD_SCRUB_ERRORS 1 scrub errors > PG_DAMAGED Possible data damage: 1 pg inconsistent > pg 6.20 is active+clean+inconsistent, acting [114,26,44] > [root@admin-ceph1-qh2 ~]# rados list-inconsistent-obj 6.20 > --format=json-pretty > { > "epoch": 210034, > "inconsistents": [] > } > > Although pg query showed the primary info.stats.stat_sum.num_bytes > differed from the peers > > A pg repair on 6.20 seems to have resolved the issue for now but the > info.stats.stat_sum.num_bytes still differs so presumably will become > inconsistent again next time it scrubs. > > Adrian. > > On Tue, Jun 5, 2018 at 12:09 PM, Adrian <aussie...@gmail.com> wrote: > >> Hi Cephers, >> >> We recently upgraded one of our clusters from hammer to jewel and then to >> luminous (12.2.5, 5 mons/mgr, 21 storage nodes * 9 osd's). After some >> deep-scubs we have an inconsistent pg with a log message we've not seen >> before: >> >> HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent >> OSD_SCRUB_ERRORS 1 scrub errors >> PG_DAMAGED Possible data damage: 1 pg inconsistent >> pg 6.20 is active+clean+inconsistent, acting [114,26,44] >> >> >> Ceph log shows >> >> 2018-06-03 06:53:35.467791 osd.114 osd.114 172.26.28.25:6825/40819 395 : >> cluster [ERR] 6.20 scrub stat mismatch, got 6526/6526 objects, 87/87 clones, >> 6526/6526 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, >> 25952454144/25952462336 bytes, 0/0 hit_set_archive bytes. >> 2018-06-03 06:53:35.467799 osd.114 osd.114 172.26.28.25:6825/40819 396 : >> cluster [ERR] 6.20 scrub 1 errors >> 2018-06-03 06:53:40.701632 mon.mon1-ceph1-qh2 mon.0 172.26.28.8:6789/0 41298 >> : cluster [ERR] Health check failed: 1 scrub errors (OSD_SCRUB_ERRORS) >> 2018-06-03 06:53:40.701668 mon.mon1-ceph1-qh2 mon.0 172.26.28.8:6789/0 41299 >> : cluster [ERR] Health check failed: Possible data damage: 1 pg inconsistent >> (PG_DAMAGED) >> 2018-06-03 07:00:00.000137 mon.mon1-ceph1-qh2 mon.0 172.26.28.8:6789/0 41345 >> : cluster [ERR] overall HEALTH_ERR 1 scrub errors; Possible data damage: 1 >> pg inconsistent >> >> There are no EC pools - looks like it may be the same as >> https://tracker.ceph.com/issues/22656 although as in #7 this is not a >> cache pool. >> >> Wondering if this is ok to issue a pg repair on 6.20 or if there's >> something else we should be looking at first ? >> >> Thanks in advance, >> Adrian. >> >> --- >> Adrian : aussie...@gmail.com >> If violence doesn't solve your problem, you're not using enough of it. >> > > > > -- > --- > Adrian : aussie...@gmail.com > If violence doesn't solve your problem, you're not using enough of it. > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com