[ceph-users] 1 pg inconsistent and does not recover

Niklas Hambüchen Tue, 27 Jun 2023 15:48:12 -0700

Hi,

I have a 3x-replicated pool with Ceph 12.2.7.


One HDD broke, its OSD "2" was automatically marked as "out", the disk was 
physically replaced by a new one, and that added back in.

Now `ceph health detail` continues to permanently show:

    [ERR] OSD_SCRUB_ERRORS: 1 scrub errors
    [ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent
        pg 2.87 is active+clean+inconsistent, acting [33,2,20]

What exactly is wrong here?

Why can Ceph not fix the issue?
With BlueStore I have checksums, on two unbroken disks, so what remaining 
inconsistency can there be?

The suggested command in 
https://docs.ceph.com/en/pacific/rados/operations/pg-repair/#commands-for-diagnosing-pg-problems
 does not work:

    # rados list-inconsistent-obj 2.87
    No scrub information available for pg 2.87
    error 2: (2) No such file or directory

Further, I find the documentation in 
https://docs.ceph.com/en/pacific/rados/operations/pg-repair/#more-information-on-pg-repair
 extremely unclear.
It says

In the case of replicated pools, recovery is beyond the scope of pg repair.


while many people on the Internet suggest that `ceph pg repair` might fix the 
issue.
Yet again others claim that Ceph will fix the issue itself.
I am hesitant to run "ceph pg repair" without understanding what the problem is 
and what exactly this will do.

I have already reported the "error 2" and the documentation in issue 
https://tracker.ceph.com/issues/61739 but not received a reply yet, and my cluster stays 
"inconsistent".

How can this be fixed?

I would appreciate any help!
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] 1 pg inconsistent and does not recover

Reply via email to