Just for clarification, the PG state is not the cause of the scrub errors.
Something happened in your cluster that caused inconsistencies between
copies of the data, the scrub noticed them, the scrub errors are why the PG
is flagged inconsistent, which does put the cluster in HEALTH_ERR.  Anyway,
just semantics from your original assessment of the situation.

Disabling scrubs is a bad idea here.  While you have a lot of scrub errors,
you only know of 1 PG that has those errors.  You may have multiple PGs
with the same problem.  Perhaps a single disk is having problems and every
PG on that disk has scrub errors.  There are a lot of other scenarios that
could be happening as well.  I would start by issuing `ceph osd scrub $osd`
to scrub all PGs on the currently known OSDs used by this PG.  If that
doesn't find anything, then try `ceph osd deep-scrub $osd`.  Those commands
are a shortcut to schedule a scrub/deep-scrub for every PG that is primary
on the given OSD.  If you don't find any more scrub errors, then you may
need to check the rest of the PGs in your cluster, definitely the ones
inside of the same pool #2 along with the currently inconsistent PG.

Now, while that's diagnosing and getting us more information... what
happened to your cluster?  Anything where OSDs were flapping up and down?
You added new storage?  Lost a drive?  Upgraded versions?  What is your
version?  What has happened in the past few weeks in your cluster?

Likely, the fix is going to start with issuing a repair of your PG.  I like
to diagnose the full scope of the problem before trying to repair things.
Also, if I can't figure out what's going on, I try to backup the PG copies
I'm repairing before doing so just in case something doesn't repair
properly.

On Sat, May 12, 2018 at 2:38 AM Faizal Latif <ahmadfaiz...@gmail.com> wrote:

> Hi Guys,
>
> i need some help. i can see currently my ceph storage showing "
> *active+clean+inconsistent*". which result HEALTH_ERR state and cause
> scrubbing error. you may find below are sample output.
>
> HEALTH_ERR 1 pgs inconsistent; 11685 scrub errors; noscrub,nodeep-scrub
> flag(s) set
> pg 2.2c0 is active+clean+inconsistent, acting [28,17,37]
> 11685 scrub errors
> noscrub,nodeep-scrub flag(s) set
>
> i have disable scrubbing since i can see there are scrub errors. i have
> also try to use rados command to see object status. and below are the
> results.
>
> rados list-inconsistent-obj 2.2c0 --format=json-pretty
> {
> "epoch": 57580,
> "inconsistents": [
> {
> "object": {
> "name": "rbd_data.10815ea2ae8944a.0000000000000385",
> "nspace": "",
> "locator": "",
> "snap": 55,
> "version": 0
> },
> "errors": [],
> "union_shard_errors": [
> "missing",
> "*oi_attr_missing*"
> ],
> "shards": [
> {
> "osd": 10,
> "errors": [
> "*oi_attr_missing*"
> ],
> "size": 4194304,
> "omap_digest": "0xffffffff",
> "data_digest": "0x32133b39"
> },
> {
> "osd": 28,
> "errors": [
> "missing"
> ]
> },
> {
> "osd": 37,
> "errors": [
> "missing"
> ]
> }
> ]
> },
> {
> "object": {
> "name": "rbd_data.10815ea2ae8944a.0000000000000730",
> "nspace": "",
> "locator": "",
> "snap": 55,
> "version": 0
> },
> "errors": [],
> "union_shard_errors": [
> "missing",
> "*oi_attr_missing*"
> ],
> "shards": [
> {
> "osd": 10,
> "errors": [
> "*oi_attr_missing*"
> ],
> "size": 4194304,
> "omap_digest": "0xffffffff",
> "data_digest": "0x0f843f64"
> },
> {
> "osd": 28,
> "errors": [
> "missing"
> ]
> },
> {
> "osd": 37,
> "errors": [
> "missing"
> ]
> }
> ]
> },
>
> i can see most of the objects show *oi_attr_missing. *is there anyway
> that i can solved this? i believe this is the reason why scrubbing keep
> failing to this pg group.
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to