Re: [ceph-users] PGs inconsistent, do I fear data loss?

Denes Dolhay Sat, 28 Oct 2017 05:39:07 -0700

Hello,

First of all, I would recommend, that you use ceph pg repair whereveryou can.

When you have size=3 the cluster can compare 3 instances, therefore itis easier for it to spot which two is good, and which one is bad.


When you use size=2 the case is harder for o-so-many ways:

-According to the documentation it is harder to determine which objectis the faulty.

-If an osd dies the increased load (caused by the missing osd) and theextra io from the recovery process hits the other osd much harder,increasing the chance that another osd dies (because of disk failurecaused by the sudden spike of extra load), and then you loose your data

-If there is a bitrot in the remaining one replica, then you do not haveany valid copies for your data

So, to summarize it, the experts say, that it is MUCH safer to havesize=3 min_size=2 (I'm far from an expert, I'm just quoting :))



So, back to the task at hand:

If you repaired all pgs that you coud by ceph pg repair, there is amanual recovery process, (written for filestore unfortunately):


http://ceph.com/geen-categorie/ceph-manually-repair-object/

The good news is, that there is a fuse client for bluestore too, so youcan mount it by hand and repair it as per the linked document,

I think that you could ceph osd pool set [pool] size 3 yo increase thecopy count, but before that you should be certain that you have enoughfree space, and you'll not hit the osd pg count limits.



[DISCLAIMER]:
I have never done this, and I too have questions about this topic:

[Questions to the list]

How is it possible that the cluster cannot repair itself with ceph pgrepair?

No good copies are remaining?
Cannot decide which copy is valid or up-to date?
If so, why not, when there is checksum, mtime for everything?

In this inconsistent state which object does the cluster serve when itdoesn't know which one is the valid?



Isn't there a way to do a more "online" repair?

A way to examine, remove objects while running the osd?

Or better yet, to tell the cluster that which copy should be used whenrepairing?

There is a command, ceph pg force-recovery, but I cannot finddocumentation for it.



Kind regards,

Denes Dolhay.



On 10/28/2017 01:05 PM, Mario Giammarco wrote:

Hello,
we recently upgraded two clusters to Ceph luminous with bluestore andwe discovered that we have many more pgs in stateactive+clean+inconsistent. (Possible data damage, xx pgs inconsistent)
This is probably due to checksums in bluestore that discover more errors.

We have some pools with replica 2 and some with replica 3.
I have read past forums thread and I have seen that Ceph do not repairautomatically inconsistent pgs.
Even manual repair sometime fails.

I would like to understand if I am losing my data:
- with replica 2 I hope that ceph chooses right replica looking atchecksums
- with replica 3 I hope that there are no problems at all

How can I tell ceph to simply create the second replica in another place?
Because I suppose that with replica 2 and inconsistent pgs I have onlyone copy of data.
Thank you in advance for any help.

Mario







_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PGs inconsistent, do I fear data loss?

Reply via email to