On Sat, Oct 28, 2017 at 5:38 AM Denes Dolhay <de...@denkesys.com> wrote:

> Hello,
>
> First of all, I would recommend, that you use ceph pg repair wherever you
> can.
>
>
> When you have size=3 the cluster can compare 3 instances, therefore it is
> easier for it to spot which two is good, and which one is bad.
>
> When you use size=2 the case is harder for o-so-many ways:
>
> -According to the documentation it is harder to determine which object is
> the faulty.
>
> -If an osd dies the increased load (caused by the missing osd) and the
> extra io from the recovery process hits the other osd much harder,
> increasing the chance that another osd dies (because of disk failure caused
> by the sudden spike of extra load), and then you loose your data
>
> -If there is a bitrot in the remaining one replica, then you do not have
> any valid copies for your data
>
> So, to summarize it, the experts say, that it is MUCH safer to have size=3
> min_size=2 (I'm far from an expert, I'm just quoting :))
>
>
> So, back to the task at hand:
>
> If you repaired all pgs that you coud by ceph pg repair, there is a manual
> recovery process, (written for filestore unfortunately):
>
> http://ceph.com/geen-categorie/ceph-manually-repair-object/
>
> The good news is, that there is a fuse client for bluestore too, so you
> can mount it by hand and repair it as per the linked document,
>

Do not do this with bluestore. In general, if you need to edit stuff, it's
probably better to use the ceph-objectstore-tool, as it leaves the store in
a consistent state.

In general you should find that clusters running bluestore are much more
effective about doing a repair automatically (because bluestore has
checksums on all data, it knows which object is correct!), but there are
still some situations where they won't. If that happens to you, I would not
follow directions to resolve it unless they have the *exact* same symptoms
you do, or you've corresponded with the list about it. :)
-Greg


>
> I think that you could ceph osd pool set [pool] size 3 yo increase the
> copy count, but before that you should be certain that you have enough free
> space, and you'll not hit the osd pg count limits.
>
>
> [DISCLAIMER]:
> I have never done this, and I too have questions about this topic:
>
> [Questions to the list]
> How is it possible that the cluster cannot repair itself with ceph pg
> repair?
> No good copies are remaining?
> Cannot decide which copy is valid or up-to date?
> If so, why not, when there is checksum, mtime for everything?
> In this inconsistent state which object does the cluster serve when it
> doesn't know which one is the valid?
>
>
> Isn't there a way to do a more "online" repair?
>
> A way to examine, remove objects while running the osd?
>
> Or better yet, to tell the cluster that which copy should be used when
> repairing?
>
> There is a command, ceph pg force-recovery, but I cannot find
> documentation for it.
>
>
> Kind regards,
>
> Denes Dolhay.
>
>
>
> On 10/28/2017 01:05 PM, Mario Giammarco wrote:
>
> Hello,
> we recently upgraded two clusters to Ceph luminous with bluestore and we
> discovered that we have many more pgs in state active+clean+inconsistent.
> (Possible data damage, xx pgs inconsistent)
>
> This is probably due to checksums in bluestore that discover more errors.
>
> We have some pools with replica 2 and some with replica 3.
>
> I have read past forums thread and I have seen that Ceph do not repair
> automatically inconsistent pgs.
>
> Even manual repair sometime fails.
>
> I would like to understand if I am losing my data:
>
> - with replica 2 I hope that ceph chooses right replica looking at
> checksums
> - with replica 3 I hope that there are no problems at all
>
> How can I tell ceph to simply create the second replica in another place?
>
> Because I suppose that with replica 2 and inconsistent pgs I have only one
> copy of data.
>
> Thank you in advance for any help.
>
> Mario
>
>
>
>
>
>
>
> _______________________________________________
> ceph-users mailing 
> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to