On Sat, Oct 28, 2017 at 5:38 AM Denes Dolhay <de...@denkesys.com> wrote:
> Hello, > > First of all, I would recommend, that you use ceph pg repair wherever you > can. > > > When you have size=3 the cluster can compare 3 instances, therefore it is > easier for it to spot which two is good, and which one is bad. > > When you use size=2 the case is harder for o-so-many ways: > > -According to the documentation it is harder to determine which object is > the faulty. > > -If an osd dies the increased load (caused by the missing osd) and the > extra io from the recovery process hits the other osd much harder, > increasing the chance that another osd dies (because of disk failure caused > by the sudden spike of extra load), and then you loose your data > > -If there is a bitrot in the remaining one replica, then you do not have > any valid copies for your data > > So, to summarize it, the experts say, that it is MUCH safer to have size=3 > min_size=2 (I'm far from an expert, I'm just quoting :)) > > > So, back to the task at hand: > > If you repaired all pgs that you coud by ceph pg repair, there is a manual > recovery process, (written for filestore unfortunately): > > http://ceph.com/geen-categorie/ceph-manually-repair-object/ > > The good news is, that there is a fuse client for bluestore too, so you > can mount it by hand and repair it as per the linked document, > Do not do this with bluestore. In general, if you need to edit stuff, it's probably better to use the ceph-objectstore-tool, as it leaves the store in a consistent state. In general you should find that clusters running bluestore are much more effective about doing a repair automatically (because bluestore has checksums on all data, it knows which object is correct!), but there are still some situations where they won't. If that happens to you, I would not follow directions to resolve it unless they have the *exact* same symptoms you do, or you've corresponded with the list about it. :) -Greg > > I think that you could ceph osd pool set [pool] size 3 yo increase the > copy count, but before that you should be certain that you have enough free > space, and you'll not hit the osd pg count limits. > > > [DISCLAIMER]: > I have never done this, and I too have questions about this topic: > > [Questions to the list] > How is it possible that the cluster cannot repair itself with ceph pg > repair? > No good copies are remaining? > Cannot decide which copy is valid or up-to date? > If so, why not, when there is checksum, mtime for everything? > In this inconsistent state which object does the cluster serve when it > doesn't know which one is the valid? > > > Isn't there a way to do a more "online" repair? > > A way to examine, remove objects while running the osd? > > Or better yet, to tell the cluster that which copy should be used when > repairing? > > There is a command, ceph pg force-recovery, but I cannot find > documentation for it. > > > Kind regards, > > Denes Dolhay. > > > > On 10/28/2017 01:05 PM, Mario Giammarco wrote: > > Hello, > we recently upgraded two clusters to Ceph luminous with bluestore and we > discovered that we have many more pgs in state active+clean+inconsistent. > (Possible data damage, xx pgs inconsistent) > > This is probably due to checksums in bluestore that discover more errors. > > We have some pools with replica 2 and some with replica 3. > > I have read past forums thread and I have seen that Ceph do not repair > automatically inconsistent pgs. > > Even manual repair sometime fails. > > I would like to understand if I am losing my data: > > - with replica 2 I hope that ceph chooses right replica looking at > checksums > - with replica 3 I hope that there are no problems at all > > How can I tell ceph to simply create the second replica in another place? > > Because I suppose that with replica 2 and inconsistent pgs I have only one > copy of data. > > Thank you in advance for any help. > > Mario > > > > > > > > _______________________________________________ > ceph-users mailing > listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com